Hi guys,

It has been a while since my last post and that’s because I had quite a busy summer; more specifically, besides my day-to-day job, a few trips and conference preparations for the 2015/2016 season, I also got the chance to work with O’Reilly on one of their video trainings. So in other words, I hereby kindly announce my first project as a trainer for O’Reilly Media.

oreilly logo

From their website:

O’Reilly Media started out as a technical writing and consulting company named O’Reilly & Associates. In 1984, we started retaining rights to manuals we created for Unix vendors. Our books were grounded in our hands-on experience with the technology, and we wrote them in a straightforward, conversational voice. We weren’t afraid to say in print that a vendor’s technology didn’t work as advertised. While our publishing program has expanded to include everything from digital photography to desktop applications to software engineering, those early principles still guide our editorial approach.

Read More →

Just a few days ago the team in Redmond has announced the general availability for Azure Search and other new announcements along with it.

For the past few months I had the opportunity to talk, blog and answer questions about Azure Search while it was still under public preview. Today however, the service is no longer in preview and this means that the search-as-a-service solution managed by Microsoft is now fully baked with SLA, stable and less-changing REST API schema and models which can be concluded as: full-text search in a box.

The purpose of Azure Search is to help software developers implement a search system within their applications (whether web, mobile or desktop) without the friction and complexity of writing SQL, JavaScript (or anything else) queries and with all the benefits of an administration-less system.

Not only did the team make the service generally available, but they also added some more flavor to this release since it comes out with great new features such as an indexer mechanism which allows Azure Search to literally crawl for data in any modern data repository such as Azure DocumentDB, Azure SQL Database or SQL Server running on Azure VMs and also the concept of suggesters (previously under preview in the 2014-10-20-Preview API version – I wrote about suggesters in the Azure Search Client Library update announcement here) which allows users to specify a suggest algorithm upon running the suggest operation available in Azure Search.

Read More →

For the past month I had the opportunity to talk about Azure Search via the Azure DevCamp roadshow put together by the ITCamp community, with the sponsorship of Microsoft. Not only did they put together a great event series, but I also had the chance to meet wonderful people interested in cloud computing across the entire country: Bucharest on February 13th, Oradea on February 20th, Timisoara on the 21st and Cluj-Napoca on the 28th.

Below are my slides (in English) and further down this post is the video recording of my presentation in Cluj-Napoca. For whatever strange reason related to my Surface’s OS going to sleep just before my presentation and not being able to find a particular .dll file part of Newtonsoft’s JSON.NET, one of my demos didn’t run as expected in Cluj-Napoca – even though everything went smooth during the other three events. I’ve also posted a few photos from some of these events in a photo gallery at the end of this post.

If you’re interested in Azure Search, feel free to download the slides and (if you’re OK with Romanian) watch my presentation’s recording.

VIDEO: Add Professional Search Features To Your Apps, Azure DevCamp 2015, Cluj-Napoca

Looking over the English dictionary for the word ‘facet’, I came to realize that facet means way more things that I originally knew: (according to http://dictionary.reference.com/browse/facet)

  1. one of the small, polished plane surfaces of a cut gem
  2. a similar surface cut on a fragment of rock by the action of water, windblown sand etc.
  3. aspect, phase, as in: ‘They carefully examined every facet of the argument’
  4. in Architecture, any of the faces of a column cut in a polygonal form
  5. in Zoology, one of the corneal lenses of a compound arthropod eye
  6. in Dentistry, a small highly burnished area, usually on the enamel surface of a tooth, produced by abrasion between opposing teeth in chewing

However, during this post I’m not going to discuss the origin of the word, but rather what a ‘facet’ means in terms of Azure Search.

Have you ever wondered how popular on-line shopping stored are able to create those complex filtering scenarios, different for every category of products and different in functionality as well? More specifically, how come high-end products show pricing categories based on a quite a few high price ranges while the accessories category comes with less, lower price ranges? Here’s an example on Amazon.co.uk:

Read More →

By the time of this writing, Azure Search Client Library, available on Nuget here, has officially been downloaded for 201 times. In order to properly celebrate this, I’m going to have the next release of the library published asap. Lots of new capabilities are on their way, so stay tuned!

One more thing: should there be a specific feature you feel is missing, do drop me a comment, message, e-mail, tweet etc. and I’m personally make sure that your requirement will go straight staring up in the product backlog.

Alex

ranking

The latest version of Azure Search Client Library (version 0.6.5370.1398) supports the usage of Scoring Profiles. But what are scoring profiles anyway?

What are scoring profiles?

Scoring profiles are a way for you to configure how results are ranked, based on one or more custom-defined criteria. Fortunately, Azure Search supports a few scoring profiles configuration types, which means that you can define a quite complex algorithm based on which your results are ranked. Specifically, your results could be boosted by:

  • the appearance of a specific keyword in a specific field; for example, a football match result could be boosted if the name of the match contains a specific keyword, compared to matches where only the description contains that keyword
  • the appearance of a specific value within a range of values; this means that if you have an index of movies and one movie has a higher user rating than another movie and both contain the same keyword you are searching for, you could boost the movie with a higher user rating, considering that people would rather search for that movie instead of a low-rated movie
  • the freshness of a new document; in other words, adding a new document in the database could impact the result corresponding to that document to be rated higher because it was added more recently than the stale documents which already exists within the index and contain the same set of keywords you are querying for
  • the location of a document; this is especially useful in cases in which you are querying for documents which contain geolocation data: for example, your favorite team’s matches which occur closer to you could get a better score than matches of the same team which occur on the other coast

All these scoring profiles are also supported in the Azure Search Client Library.

One of the coolest things about Scoring Profiles is that they can define a multitude of functions based on which you can boost the results and, moreover, each function used when calculating the score can have a different booster.

How do I boost results based on specific fields?

The most common way you’d probably boost your results is by having specific keywords in specific fields. For example, if you’re querying for football matches, match names which contain your keyword would probably be boosted compared to matches where only the description contains that keyword.

Using Azure Search Client Library, this is done by instantiating a Scoring object and specifying the weight of the fields.

Here’s an example:

var scoringProfile1 = new Scoring("scoreByName", SearchableEvent.GetSearchableEventFields())
{
    FunctionAggregation = FunctionAggregationTypes.Sum
};
scoringProfile1.Text.Weights["name"] = 100;

In this example, a new Scoring object is instantiated with the “scoreByName” name of the scoring profile and with a list of fields corresponding to the Index. The name is required for the scoring profile because it is going to be referenced when querying data by using its name.

Afterwards, a scoring profile weight is applied to the field named “name”. This basically specifiesthat when this scoring profile is used when querying documents, documents containing the keyword in the name field will be boosted by 100 compared to documents which contain the keyword in other fields.

How do I boost results corresponding to newer documents?

Another common scenario when using searching systems is to have newer documents boosted compared to stale documents. In other words, if a new document is added to the index, this specific document could be ranked higher. Considering our examples of football matches, freshness boosting is useful in two ways:

  1. boosting a newly added document could help in selling more tickets to these events sooner
  2. inverted-freshness-boosting: events could be boosted a few days before the match occurs, thus making sure that they will be returned on better positions a few days before the event, even if their original score result (un-boosted score) isn’t too high

Using Azure Search Client Library, a freshness boosting is applied by adding a FreshnessFunction to the list of functions within a scoring profile. Considering the previous example, this is done like this:

var function1 = new FreshnessFunction()
{
    Boost = 20,
    BoostingDuration = new TimeSpan(0, 13, 15, 18),
    FieldName = "dateadded",
    Interpolation = InterpolationTypes.Logarithmic
};
scoringProfile1.Functions = new List() { function1 };

In this example, a new FreshnessFunction is instantiated with the following properties: the boost applied to any search results that match the keywords is 20 and the boosting is applied to the field named “dateadded” but only for 13 hours, 15 minutes and 18 seconds (according to the BoostingDuration property) after the date and time vale specified in the “dateadded” field.

How do I boost results based on geolocation?

Considering our football matches example, whenever a user might search for his favorite team’s matches, matches which occur closer to his location could be boosted compared to matches which occur further away. This is also a particularly useful feature for mobile applications or location aware web applications.

Using Azure Search Client Library, a geolocation boosting can be applied after instantiating a DistanceFunction object. Here’s an example:

var function2 = new DistanceFunction()
{
    Boost = 10,
    BoostingDistance = 150,
    ReferencePointParameter = "mylocation",
    FieldName = "geolocation",
    Interpolation = InterpolationTypes.Constant,
};
scoringProfile1.Functions = new List() { function2 };

In the previous example, a DistanceFunction is used when calculating a query’s results using the scoringProfile1 scoring profile. This function instructs the scoring calculator to boost results located within 150 km away from a location sent when querying the data through a parameter called “mylocation”. Due to this function parameter, the DistanceFunction is a special function because it allows the dynamic calculation of search results based on user input other than keyword. The “geolocation” value of FieldName specifies that the field containing the location of the football match is called “geolocation”. Keep in mind though, that this field must be of type GeographyPoint (Note: using Azure Search Client Library version 0.6.5370.1398, you can save location data using the GeographyPoint model class. This helps in saving geolocation data because it exposes Latitude and Longitude properties, thus saving you the trouble of serializing and deserializing geolocation data).

How do I boost results based on their rating?

It’s common for huge index repositories to boost search results based on a specific values within a range. For example, in a movie database, a movie rated higher by viewers would be boosted compared to poor movies (e.g. IMDB search results for “love” returns the 1969 movie called “Women in Love” – rated 7.8 by the time of this writing – on the 3rd position compared to the 2011 title named “Love Birds”, rated only with a score of 5.9 and positioned at the end of the search results page).

Boosting results based on a specific value within a specific range is called magnitude boosting and this is done by using a MagnitudeFunction. Here’s an example using Azure Search Client Library:

var function3 = new MagnitudeFunction()
{
    Boost = 1000,
    BoostingRangeStart = 9,
    BoostingRangeEnd = 10,
    ConstantBoostBeyondRange = false,
    FieldName = "rating",
    Interpolation = InterpolationTypes.Constant
};
scoringProfile1.Functions = new List() { function3 };

In this example, the magnitude function boosts document results where the field named “rating” contains a value within 9 and 10 with a booster of 1000.

Notes on scoring profile functions

Even though all the previous examples only instantiate the Function numerator with a single function, the Azure Search service allows you to use more (or even all) these functions simultaneously. Moreover, there’s no restrain on using the same function type over and over again, as long as the field and/or dynamic parameters used within the function are different.

In order to use all these functions simultaneously, all you have to do is simply instantiate the Function numerator with all the functions, like this:

scoringProfile1.Functions = new List() { function1, function2, function3 };

Keep in mind though that the booster applied to a field containing a keyword is not considered a function, due to a few reasons:

  • first, functions allow the notion of Interpolation which, as the Azure Search REST API explains it, is a way to ‘define the slope for which the score boosting increases from the start of the range to the end of the range’. This notion cannot be applied to text keyword boosting because a field either contains a specific keyword, or doesn’t
  • second, when using more functions within a scoring profile, there’s a notion of aggregating the functions in order to get the final result. As you’ll see next, there’s no point in aggregating these functions with the text booster, because documents which don’t contain the keyword won’t be returned in the search results (or, if no keyword is used, than the booster won’t be used altogether, unlike functions which are – or at least, can be – still valid for empty queries)

When you specify more than one function within a scoring profile, these function will be aggregated in order to get the final result score. By default, Azure Search aggregates the results by summing their initial result. However, you can instruct the score calculator to use other aggregation mechanisms:

  • Maximum: only the maximum score returned by the use of a single function is used, whatever that function’s type is
  • Minimum: the exact opposite of the previous aggregation type
  • Average: rather than summing the scores, an average result will be calculated and the result will correspond to the end result; this is useful when you want to lower a result’s score if it doesn’t correspond to all the functions defined within the scoring profile
  • First matching: the first function which matches the scoring profile function definitions is used for calculating the end result; this is similar to Greedy algorithm and has the best performance but might return invalid or unexpected search results
  • Sum: the default aggregation type; sums up all the initial scores using the functions and uses the sum result as end query result score

What happens if I don’t use a scoring profile?

If no scoring profile is used, Azure Search uses a model based on term frequency-inverse document frequency (tf-idf for short), which, according to Wikipedia, is ‘a numerical statistic that is intended to reflect how important a word is to a document in a collection‘. More specifically, Azure Search currently uses Lucene’s implementation of an algebraic model called Vector Space Model.

In other words, they check how frequent a given word is across the index (global frequency) and within the field (local frequency) and thus determine how special a given word is. From this result, Azure Search derives a specific value.

The implications of this model are:

  1. Hits of rare terms (low global frequency) will have higher scores than hits with terms that show up all over the index
  2. The more often a specific term shows up in a field (high local frequency), the higher the score for a hit to that term (within limits, however)
  3. Length-normalization: if a field has two terms and one is a hit, this will rate better than the same field and same term but with more values within the field (say, 10 values within the field).

All the results of these calculations are then summed up to result into the score you get when you query for some specific document without using any scoring profiles.

How do I use a scoring profile when I query my index?

Using Azure Search Client Library, when you query an index you simply have to specify a scoring profile’s name in the QueryParameters object’s property named ScoringProfile. If a scoring profile parameter is required, then you also have to send out a Dictionary<string,string> object, where the key will correspond to the paramters’ names and the value will correspond to the parameters’ values. Here’s an example:

var scoringParams = new Dictionary<string, string>();
scoringParams.Add("mylocation", "-122.3358423,47.6148481");
var result = await _azureSearchService.Indexes[searchIndex].QueryAsync(new QueryParameters()
    {
        QueryText = searchText,
        ScoringProfile = searchScoringProfile,
        ScoringParameters = scoringParams        
    });

Note: using Azure Search Client Library version 0.6.5370.1398, when you’re sending out a geolocation value as a scoring parameter, keep in mind that:

  • the position is serialized in LONGITUDE and LATITUDE order due to the Azure Search service’s requirements; in a future release (TBD soon), you’ll also be able to use the GeographyPoint data type to get the serialization done out-of-the-box
  • longitude and latitude attributed of a coordinate must be separated by the use of a comma
  • decimals are separated using the common English dot separator for decimals

As an additional note, also keep in mind that all scoring parameters defined within a scoring profile must be sent with the query when using that scoring profile. There’s currently no way of specifying a default scoring parameter value.

Today I’m happy to announce the update of Azure Search Client Library. The new version adds tons of performance improvements, feature additions and bug fixes, among which the full support for scoring profiles and the ability to easier create/update/delete documents are just a few.

Moreover, because for some reason I missed doing it previously, I’ve finally added full IntelliSense support for all classes within the library. The full list of changes is available here and the NuGet package is available here or via the NuGet Package Manager.

Happy searching!

I’m super excited to announce the public availability of Azure Search Client Library, a client library which works as a wrapper around the REST API available for the Azure Search Service.

This library intends to make you development tasks easier when it comes do doing any management or querying related tasks on your Azure Search indexes. The client library is available on NuGet (https://www.nuget.org/packages/AzureSearchClient) and the current version is 0.5.5355.2536. I’ve also written a short Getting Started-like documentation which is available at

Among others, this NuGet package is also my first and foremost ever NuGet package publicly made available on NuGet.org so I’d appreciate it even more if you’d leave comments on whatever you’d like to see next in the package.

Alex