Just a few days ago the team in Redmond has announced the general availability for Azure Search and other new announcements along with it.

For the past few months I had the opportunity to talk, blog and answer questions about Azure Search while it was still under public preview. Today however, the service is no longer in preview and this means that the search-as-a-service solution managed by Microsoft is now fully baked with SLA, stable and less-changing REST API schema and models which can be concluded as: full-text search in a box.

The purpose of Azure Search is to help software developers implement a search system within their applications (whether web, mobile or desktop) without the friction and complexity of writing SQL, JavaScript (or anything else) queries and with all the benefits of an administration-less system.

Not only did the team make the service generally available, but they also added some more flavor to this release since it comes out with great new features such as an indexer mechanism which allows Azure Search to literally crawl for data in any modern data repository such as Azure DocumentDB, Azure SQL Database or SQL Server running on Azure VMs and also the concept of suggesters (previously under preview in the 2014-10-20-Preview API version – I wrote about suggesters in the Azure Search Client Library update announcement here) which allows users to specify a suggest algorithm upon running the suggest operation available in Azure Search.

Read More →

For the past month I had the opportunity to talk about Azure Search via the Azure DevCamp roadshow put together by the ITCamp community, with the sponsorship of Microsoft. Not only did they put together a great event series, but I also had the chance to meet wonderful people interested in cloud computing across the entire country: Bucharest on February 13th, Oradea on February 20th, Timisoara on the 21st and Cluj-Napoca on the 28th.

Below are my slides (in English) and further down this post is the video recording of my presentation in Cluj-Napoca. For whatever strange reason related to my Surface’s OS going to sleep just before my presentation and not being able to find a particular .dll file part of Newtonsoft’s JSON.NET, one of my demos didn’t run as expected in Cluj-Napoca – even though everything went smooth during the other three events. I’ve also posted a few photos from some of these events in a photo gallery at the end of this post.

If you’re interested in Azure Search, feel free to download the slides and (if you’re OK with Romanian) watch my presentation’s recording.

VIDEO: Add Professional Search Features To Your Apps, Azure DevCamp 2015, Cluj-Napoca

azure_search_client_library_logo-264x264I’m happy to announce that ASCL (Azure Search Client Library) has received a new update, namely 0.8.5522.36498. Using the newer version you can now enjoy suggestion algorightms without worrying about the little bugs :), use suggestions using the freshly announced ‘Suggestor’ functionality, use Tag boosting and take complete advantage of the multi-lingual support of Azure Search.

Along with this update I’ve also written two new ‘Getting started’ projects which help you better understand how to use ASCL.

Happy downloading https://www.nuget.org/packages/AzureSearchClient/0.8.5522.36498.

Microsoft AzureAlong with fellow community leaders and speakers and with the support of Microsoft Romania, I’m putting together the first community organized Azure-centric event in Oradea for 2015!

Come and join us at Azure DevCamp Oradea

Part of a series of seven events taking part across the entire country (Bucharest, Oradea, Timisoara, Targu-Mures, Cluj-Napoca, Sibiu, Brasov), Azure DevCamp Oradea is your chance to learn more about the freshly announced services in Azure:

  • Azure Search
  • Azure as a backend for cross-platoform mobile apps
  • BigData in Azure: HDInsights
  • Azure for the Enteprise

These sessions will be presented by (in alphabetical order) Ciprian Jichici (Microsoft Regional Director), Alex Mang (Azure Advisor), Mihai Tataran (Microsoft Most Valuable Professional) and Radu Vunvulea (Microsoft Most Valuable Professional).

Azure DevCamp Oradea will take place on the 20th of February, at Hotel Continental Forum (1 Aleea Standului), will start at exactly 16:00 and is completely FREE of charge. However, registration is required prior to the event and can only be done at http://aka.ms/oradea-20-februarie.

Register today, seats are limited!

Read More →

ranking

The latest version of Azure Search Client Library (version 0.6.5370.1398) supports the usage of Scoring Profiles. But what are scoring profiles anyway?

What are scoring profiles?

Scoring profiles are a way for you to configure how results are ranked, based on one or more custom-defined criteria. Fortunately, Azure Search supports a few scoring profiles configuration types, which means that you can define a quite complex algorithm based on which your results are ranked. Specifically, your results could be boosted by:

  • the appearance of a specific keyword in a specific field; for example, a football match result could be boosted if the name of the match contains a specific keyword, compared to matches where only the description contains that keyword
  • the appearance of a specific value within a range of values; this means that if you have an index of movies and one movie has a higher user rating than another movie and both contain the same keyword you are searching for, you could boost the movie with a higher user rating, considering that people would rather search for that movie instead of a low-rated movie
  • the freshness of a new document; in other words, adding a new document in the database could impact the result corresponding to that document to be rated higher because it was added more recently than the stale documents which already exists within the index and contain the same set of keywords you are querying for
  • the location of a document; this is especially useful in cases in which you are querying for documents which contain geolocation data: for example, your favorite team’s matches which occur closer to you could get a better score than matches of the same team which occur on the other coast

All these scoring profiles are also supported in the Azure Search Client Library.

One of the coolest things about Scoring Profiles is that they can define a multitude of functions based on which you can boost the results and, moreover, each function used when calculating the score can have a different booster.

How do I boost results based on specific fields?

The most common way you’d probably boost your results is by having specific keywords in specific fields. For example, if you’re querying for football matches, match names which contain your keyword would probably be boosted compared to matches where only the description contains that keyword.

Using Azure Search Client Library, this is done by instantiating a Scoring object and specifying the weight of the fields.

Here’s an example:

var scoringProfile1 = new Scoring("scoreByName", SearchableEvent.GetSearchableEventFields())
{
    FunctionAggregation = FunctionAggregationTypes.Sum
};
scoringProfile1.Text.Weights["name"] = 100;

In this example, a new Scoring object is instantiated with the “scoreByName” name of the scoring profile and with a list of fields corresponding to the Index. The name is required for the scoring profile because it is going to be referenced when querying data by using its name.

Afterwards, a scoring profile weight is applied to the field named “name”. This basically specifiesthat when this scoring profile is used when querying documents, documents containing the keyword in the name field will be boosted by 100 compared to documents which contain the keyword in other fields.

How do I boost results corresponding to newer documents?

Another common scenario when using searching systems is to have newer documents boosted compared to stale documents. In other words, if a new document is added to the index, this specific document could be ranked higher. Considering our examples of football matches, freshness boosting is useful in two ways:

  1. boosting a newly added document could help in selling more tickets to these events sooner
  2. inverted-freshness-boosting: events could be boosted a few days before the match occurs, thus making sure that they will be returned on better positions a few days before the event, even if their original score result (un-boosted score) isn’t too high

Using Azure Search Client Library, a freshness boosting is applied by adding a FreshnessFunction to the list of functions within a scoring profile. Considering the previous example, this is done like this:

var function1 = new FreshnessFunction()
{
    Boost = 20,
    BoostingDuration = new TimeSpan(0, 13, 15, 18),
    FieldName = "dateadded",
    Interpolation = InterpolationTypes.Logarithmic
};
scoringProfile1.Functions = new List() { function1 };

In this example, a new FreshnessFunction is instantiated with the following properties: the boost applied to any search results that match the keywords is 20 and the boosting is applied to the field named “dateadded” but only for 13 hours, 15 minutes and 18 seconds (according to the BoostingDuration property) after the date and time vale specified in the “dateadded” field.

How do I boost results based on geolocation?

Considering our football matches example, whenever a user might search for his favorite team’s matches, matches which occur closer to his location could be boosted compared to matches which occur further away. This is also a particularly useful feature for mobile applications or location aware web applications.

Using Azure Search Client Library, a geolocation boosting can be applied after instantiating a DistanceFunction object. Here’s an example:

var function2 = new DistanceFunction()
{
    Boost = 10,
    BoostingDistance = 150,
    ReferencePointParameter = "mylocation",
    FieldName = "geolocation",
    Interpolation = InterpolationTypes.Constant,
};
scoringProfile1.Functions = new List() { function2 };

In the previous example, a DistanceFunction is used when calculating a query’s results using the scoringProfile1 scoring profile. This function instructs the scoring calculator to boost results located within 150 km away from a location sent when querying the data through a parameter called “mylocation”. Due to this function parameter, the DistanceFunction is a special function because it allows the dynamic calculation of search results based on user input other than keyword. The “geolocation” value of FieldName specifies that the field containing the location of the football match is called “geolocation”. Keep in mind though, that this field must be of type GeographyPoint (Note: using Azure Search Client Library version 0.6.5370.1398, you can save location data using the GeographyPoint model class. This helps in saving geolocation data because it exposes Latitude and Longitude properties, thus saving you the trouble of serializing and deserializing geolocation data).

How do I boost results based on their rating?

It’s common for huge index repositories to boost search results based on a specific values within a range. For example, in a movie database, a movie rated higher by viewers would be boosted compared to poor movies (e.g. IMDB search results for “love” returns the 1969 movie called “Women in Love” – rated 7.8 by the time of this writing – on the 3rd position compared to the 2011 title named “Love Birds”, rated only with a score of 5.9 and positioned at the end of the search results page).

Boosting results based on a specific value within a specific range is called magnitude boosting and this is done by using a MagnitudeFunction. Here’s an example using Azure Search Client Library:

var function3 = new MagnitudeFunction()
{
    Boost = 1000,
    BoostingRangeStart = 9,
    BoostingRangeEnd = 10,
    ConstantBoostBeyondRange = false,
    FieldName = "rating",
    Interpolation = InterpolationTypes.Constant
};
scoringProfile1.Functions = new List() { function3 };

In this example, the magnitude function boosts document results where the field named “rating” contains a value within 9 and 10 with a booster of 1000.

Notes on scoring profile functions

Even though all the previous examples only instantiate the Function numerator with a single function, the Azure Search service allows you to use more (or even all) these functions simultaneously. Moreover, there’s no restrain on using the same function type over and over again, as long as the field and/or dynamic parameters used within the function are different.

In order to use all these functions simultaneously, all you have to do is simply instantiate the Function numerator with all the functions, like this:

scoringProfile1.Functions = new List() { function1, function2, function3 };

Keep in mind though that the booster applied to a field containing a keyword is not considered a function, due to a few reasons:

  • first, functions allow the notion of Interpolation which, as the Azure Search REST API explains it, is a way to ‘define the slope for which the score boosting increases from the start of the range to the end of the range’. This notion cannot be applied to text keyword boosting because a field either contains a specific keyword, or doesn’t
  • second, when using more functions within a scoring profile, there’s a notion of aggregating the functions in order to get the final result. As you’ll see next, there’s no point in aggregating these functions with the text booster, because documents which don’t contain the keyword won’t be returned in the search results (or, if no keyword is used, than the booster won’t be used altogether, unlike functions which are – or at least, can be – still valid for empty queries)

When you specify more than one function within a scoring profile, these function will be aggregated in order to get the final result score. By default, Azure Search aggregates the results by summing their initial result. However, you can instruct the score calculator to use other aggregation mechanisms:

  • Maximum: only the maximum score returned by the use of a single function is used, whatever that function’s type is
  • Minimum: the exact opposite of the previous aggregation type
  • Average: rather than summing the scores, an average result will be calculated and the result will correspond to the end result; this is useful when you want to lower a result’s score if it doesn’t correspond to all the functions defined within the scoring profile
  • First matching: the first function which matches the scoring profile function definitions is used for calculating the end result; this is similar to Greedy algorithm and has the best performance but might return invalid or unexpected search results
  • Sum: the default aggregation type; sums up all the initial scores using the functions and uses the sum result as end query result score

What happens if I don’t use a scoring profile?

If no scoring profile is used, Azure Search uses a model based on term frequency-inverse document frequency (tf-idf for short), which, according to Wikipedia, is ‘a numerical statistic that is intended to reflect how important a word is to a document in a collection‘. More specifically, Azure Search currently uses Lucene’s implementation of an algebraic model called Vector Space Model.

In other words, they check how frequent a given word is across the index (global frequency) and within the field (local frequency) and thus determine how special a given word is. From this result, Azure Search derives a specific value.

The implications of this model are:

  1. Hits of rare terms (low global frequency) will have higher scores than hits with terms that show up all over the index
  2. The more often a specific term shows up in a field (high local frequency), the higher the score for a hit to that term (within limits, however)
  3. Length-normalization: if a field has two terms and one is a hit, this will rate better than the same field and same term but with more values within the field (say, 10 values within the field).

All the results of these calculations are then summed up to result into the score you get when you query for some specific document without using any scoring profiles.

How do I use a scoring profile when I query my index?

Using Azure Search Client Library, when you query an index you simply have to specify a scoring profile’s name in the QueryParameters object’s property named ScoringProfile. If a scoring profile parameter is required, then you also have to send out a Dictionary<string,string> object, where the key will correspond to the paramters’ names and the value will correspond to the parameters’ values. Here’s an example:

var scoringParams = new Dictionary<string, string>();
scoringParams.Add("mylocation", "-122.3358423,47.6148481");
var result = await _azureSearchService.Indexes[searchIndex].QueryAsync(new QueryParameters()
    {
        QueryText = searchText,
        ScoringProfile = searchScoringProfile,
        ScoringParameters = scoringParams        
    });

Note: using Azure Search Client Library version 0.6.5370.1398, when you’re sending out a geolocation value as a scoring parameter, keep in mind that:

  • the position is serialized in LONGITUDE and LATITUDE order due to the Azure Search service’s requirements; in a future release (TBD soon), you’ll also be able to use the GeographyPoint data type to get the serialization done out-of-the-box
  • longitude and latitude attributed of a coordinate must be separated by the use of a comma
  • decimals are separated using the common English dot separator for decimals

As an additional note, also keep in mind that all scoring parameters defined within a scoring profile must be sent with the query when using that scoring profile. There’s currently no way of specifying a default scoring parameter value.