Friday, December 11, 2020

Azure Data Explorer - Approaches For Data Aggregation In Kusto

In my previous posts I tried to transcribe the things that were not too obvious for me when I initially started working on Kusto Query Language. Continuing with the same thought, this time I’m going to share a few of the approaches that can be taken to aggregate the data. 

Let’s consider the below input data:

  1. let demoData = datatable(Environment: string, Version:int, BugCount:int)  
  2. [  
  3. "dev",1, 1,  
  4. "test",1, 1,  
  5. "prod",1, 1,  
  6. "dev",2, 2,  
  7. "test",2, 0,  
  8. "dev",3, 2,  
  9. "test",3, 0,  
  10. "prod",2,2,  
  11. ];


Get the average number of bugs falling under each category. 

Expected Output


There are several approaches to achieve this. 

Approach 1 - Using Partition Operator 

Partition operator first partitions the input data with defined criteria and then combines all the results.

  1. demoData| partition by Environment (summarize ceiling(avg(BugCount)) by Environment);  
Approach 2 - Using Join Operator 

Join merges the two tables based on the specified key.

  • demoData| join kind=leftouter (  
  • demoData | summarize ceiling(avg(BugCount)) by Environment) on Environment  
  • | project Environment, avg_BugCount  
  • | distinct Environment,avg_BugCount;  
    Approach 3 - Using Lookup Operator 

    Lookup operator extends the column of the second table and looks up the values in the first one.

    1. let Averages = demoData  
    2. | summarize ceiling(avg(BugCount)) by Environment;  
    3. demoData | lookup (Averages) on Environment  
    4. | project Environment, avg_BugCount  
    5. | distinct Environment,avg_BugCount  

    I hope you enjoyed aggregating data. 

    Happy Kustoing!

    Sunday, November 29, 2020

    Azure Data Explorer - Reading JSON Data Using Kusto

    You may have a requirement wherein you have a data stored in a column as JSON format and business need is to read that column value. Now when it comes to JSON, there are few ways, which can help us to read this data and represent that in a meaningful and readable manner.

    Let’s consider below sample data:

    In the above table, last column named Description is holding the data which is in JSON format.

    Using Dynamic

    One way to extract data of description column is by using the dynamic literal as shown in below query:

    1. demoData 
    2. | extend AllProperties = todynamic(Description)  
    3. | project Environment, BugId = AllProperties["Id"], AssignedTo = AllProperties["AssignedTo"

    On execution of above query, you will notice that all the properties of JSON are extracted in the form of new columns, as shown below:

    We can further improvise the above query in terms of readability. If the column title and the JSON property are having the same name, then JSON property can be directly accessed using dot as shown below for AssignedTo:

    1. demoData  
    2. | extend AllProperties = todynamic(Description)  
    3. | project Environment, BugId = AllProperties["Id"], AssignedTo = AllProperties.AssignedTo 

    The result of above query would also be the same as shown above. 

    Using parse_json

    Sometimes we do have a requirement to extract just one or two properties from JSON column. In such scenario, reading entire JSON value and converting it would be an expensive operation. 

    Here comes the parse_json to rescue us. Below is the sample query to achieve this:

    1. demoData   
    2. | extend AssignedTo = tostring(parse_json(Description)["AssignedTo"])  
    3. | project Environment, ItemId, AssignedTo  

    On execution of the above query, below result can be achieved:

    Hope you enjoyed extracting JSON data.

    Happy kustoing!

    Wednesday, November 18, 2020

    Perform Calculation On Multiple Values From Single Kusto Input

    Let’s consider a scenario, wherein requirement is to find out the percentage of a particular type of values from the single input set.

    Below can be considered as an example of input sample data and need is to find out how much percentage of dev releases and how much percentage of prod releases are present in the input data.

    1. let demoData = datatable(Environment: string, Feature:string)  
    2. [  
    3. "dev""Feature1",  
    4. "test""Feature1",  
    5. "prod""Feature1",  
    6. "Dev""Feature2",  
    7. "test""Feature2",  
    8. "dev""Feature3",  
    9. "test""Feature3",  
    10. "prod""Feature3"  
    11. ]; 


    In order to achieve the solution, one has to go through various steps as mentioned below:

    Step 1: Get total number of records from input set

    1. let totalRecords = demoData
    2. count 
    3. | project TotalRecords = Count;  

    Step 2: Get only those records which are of type ‘dev’

    1. let devRecords = demoData
    2. where Environment =~ "dev" 
    3. count 
    4. | project TotalDevRecords = Count;  

    Step 3: Get only those records which are of type ‘prod’

    1. let prodRecords = demoData
    2. where Environment =~ "prod" 
    3. count
    4. | project TotalProdRecords=Count

    So far we have got all the individual parts. The next task is to combine all the above mentioned 3 steps and generate a single result set and here comes the challenge.


    As input set is holding only two columns, there is no common field in all the above mentioned three queries and as there is no commonality it is significantly difficult to bring such result set together to form a single result set.

    Addressing the challenge

    Can’t we go ahead and introduce some new column just for the sake of projection? Well, let’s see how that changes our above 3 steps now:

    Updated Step 1

    1. let totalRecords = demoData  
    2. count |extend CommonCol="Dummy"   
    3. | project CommonCol, TotalRecords = Count;

    Updated Step 2

    1. let devRecords = demoData  
    2. where Environment =~ "dev"   
    3. count | extend CommonCol="Dummy"   
    4. | project CommonCol, TotalDevRecords = Count;

    Updated Step 3

    1. let prodRecords = demoData  
    2. where Environment =~ "prod"   
    3. count|extend CommonCol="Dummy"   
    4. | project CommonCol, TotalProdRecords = Count

    Now comes the final step, wherein we need to bring all the above result set together to calculate the percentage.

    Step 4:

    Combining the individual results to get a single result.

    1. totalRecords  
    2. join (devRecords | join prodRecords on CommonCol) on CommonCol  
    3. | extend DevRecords = (TotalDevRecords * 100)/TotalRecords  
    4. | extend ProdRecords = (TotalProdRecords * 100)/TotalRecords  
    5. | project DevRecords, ProdRecords; 

    On execution of the above steps, you will get the desired output as shown below:

    Hope you enjoyed learning. 

    Happy kustoing.

    Thursday, November 12, 2020

    Working with Kusto Case Sensitivity

    Like most of the other programming and query languages, Kusto too has sense of case sensitivity, which means, it can deal with upper-case and lower-case while performing comparisons between values.

    Let’s consider below sample data:

    1. let demoData = datatable(Environment: string, Feature:string)  
    2. [    
    3.    "dev""Feature1",  
    4.    "test""Feature1",  
    5.    "prod""Feature1",  
    6.    "Dev""Feature2",  
    7.    "test""Feature2",  
    8.    "dev""Feature3",  
    9.    "test""Feature3",  
    10.    "prod""Feature3"    
    11. ];

    Case Sensitive Comparison

    The Case sensitive means match should be exact, upper-case letter must match with upper-case only and same for lower-case. Whenever the match is performed between an upper-case character and a lower-case character, query would return false, although both the characters are same. For example, dev and Dev are not same.

    Query description

    Get list of features, which belongs to dev environment.


    1. demoData| where Environment == "dev"  
    As “==” stands for case sensitive comparison, above query will result in below output:

    Case Insensitive Comparison

    Case insensitive comparison behaves in completely opposite fashion as case sensitive comparison does. Whenever the match is performed between an upper-case character and a lower-case character, query would return true, as long as both the characters are same. For example, dev and Dev are same.

    Now, to achieve this behavior there are multiple approaches.

    Approach 1

    In this approach, one can first convert the string using toupper(…) or tolower(…) functions and then perform the comparison as shown below:

    1. demoData| where tolower(Environment) == "dev"  

    Approach 2

    In this approach, no need to call any extra function as inbuild operator will do this for us as shown below:

    1. demoData| where Environment =~ "dev"  

    Here “=~” performs the for case-insensitive comparison. 

    Execution of both the above queries result in same output as shown below:


    Performance Tip

    • Always prefer case-sensitive over case-insensitive, wherever possible.
    • Always prefer has or in over contains.
    • Avoid using short strings as it impacts indexing. 

    Happy kustoing!