Sift Audience Data Query Language (DQL) – 1

I had been thinking long and hard on enhancing the Audience Management capability in Sift. Everybody seems to have it – from the mighty big players to the obscure ones. In such a crowded space, it is of prime importance to have a very clear and a striking “unique selling proposition” before you boast that you have this capability too. Basically, the tool shall solve a problem that was very challenging previously, in a cost effective way that was not thought of commonly.

Now, Sift is a real time analytics engine. So obviously, the profiles it builds about individual entities are accurate – up to the second. That gets a lot of brownies to start with. And as Sift would limitlessly process data, the data points it can compute for a single entity would also be much wider in range. This is great too. But,

1. What if I add “relationships between the entities” into this mix?

For example, retrieving “all customers who just visited a particular online storefront” is a usual real time audience query. But to retrieve “all customers who just visited a particular online storefront and also all the high spending members in their social group“, is what I am talking about. Wait a minute, this sounds like the Facebook flavour of challenge? Yes. But as we saw before, Sift has the advantage of dealing with very deep first party data. Therefore the profiles in Sift would be much deeper, wider, “real time” (and could be much richer than Facebook itself). Therefore, adding relationships to this mix makes it insanely powerful.

2. Now, what if I also add an exciting DQL (Data Query Language) to manipulate the audience data?

Packed inside an intuitive API, the DQL would bring extreme flexibility and richness by which any client can combine, mix and match and manipulate the audience data. Such flexible APIs would make Sift a powerful DMP platform and a critical ingredient to embark on the data monetization journey.

I think these additional 2 aspects are super convincing unique selling propositions by themselves. To add these capabilities, I ended up building a very special purpose database, which is also web-scale and in-memory. More on that exciting component later. The focus of this blog is to talk about the DQL and the APIs for playing with Sift’s Audience Management feature.

To start with, if you want to query the list of profiles(we call them ‘Active Behaviour Clusters’ or in short ‘ABCs’) that Sift tracks about any entity, you may try this query :

curl -d '{ "query" : "?:ABC","resultType" : "reducedValues", "executionMode" : "async"}' -H "Content-Type: application/json" -X POST http://13.76.166.53:8111/api/v1/query

Sift would return a result as below, depending on what is defined in this cloud instance of Sift at the time of query. (Yes, that ip address hits a sandbox with a purposely relaxed security)

{"status":"success","query":"?:ABC","processingTime(ms)":2,"total":40,"result":["HiRF_MiSL:ABC","MiRF_HiSL:ABC","MASS:ABC","RECOVERED_SILENT:ABC","MIGRANT_NEWBIE:ABC","SILENT_HIGH_ARPU:ABC","MIGRANT:ABC","ATL_GROUP:ABC","TOURIST:ABC","NEWBIE:ABC","BTL_GROUP:ABC","HiRF_HiSL:ABC","IN_SilomDistrict:ABC","ACTIVE:ABC","BTL_SPECIAL_GROUP:ABC","BURN_CHKBAL_CLUSTER:ABC","EXISTING:ABC","IN_AllSeasons:ABC","IN_SiamSquareShopping:ABC","BURN_SOS_CLUSTER:ABC","MiRF_MiSL:ABC","SUSPENDED:ABC","SILENT_SEED:ABC","MIGRANT_EXISTING:ABC","HiRF_LoSL:ABC","ROCKET_REFILL_NON_RESPONDE:ABC","BURN_MOBILEON_CLUSTER:ABC","REFILL_SEED:ABC","MiRF_LoSL:ABC","MASS_EXISTING:ABC","SILENT_LOW_ARPU:ABC","BURN_REFILL_CLUSTER:ABC","LoRF_LoSL:ABC","CLT_MNP_IMMEDIATE_FILE:ABC","CLT_MNP_HOLD_FILE:ABC","LoRF_MiSL:ABC","BURN_ENDCALL_CLUSTER:ABC","LOW_BALANCE:ABC","LoRF_HiSL:ABC","MASS_NEWBIE:ABC"]}

Let us dissect this result first. You would notice that the result is a Json output and it has an array to hold the actual results. Each entry in the result array ends with “ABC”, just to say that this is an Active Behaviour Cluster. In the query you asked for ?:ABC. This is what I call a Schema Query, which would list all the ABCs. In the query, you also mentioned that that the “resultType” is “reducedValues”, which means to list all the values combined in one array. You may also try this query by replacing the “reducedValues” to “count”.

Now, having seen the list of ABCs in Sift, you may also query how many customers (or entities) have been classified to be in a particular ABC.

curl -d '{ "query" : "ACTIVE:ABC","resultType" : "count", "executionMode" : "async"}' -H "Content-Type: application/json" -X POST http://13.76.166.53:8111/api/v1/query

I got this result but you may get a different one depending on the state at which you try

{"status":"success","query":"ACTIVE:ABC","processingTime(ms)":1,"result":100}

This is to say that there are 100 customers that are ACTIVE at this very second (you can say that these are people who have active contracts with a telco, a bank or a retail and are doing regular transactions). Now, what if I want to ask for ACTIVE customers who are also falling into LOW_BALANCE profile ?

curl -d '{ "query" : "(AND LOW_BALANCE:ABC ACTIVE:ABC)","resultType" : "count", "executionMode" : "async"}' -H "Content-Type: application/json" -X POST http://13.76.166.53:8111/api/v1/query

What if I want to ask how many such low balance customers are there in “Siam Square Shopping Center” geofence, right at this second ?

curl -d '{ "query" : "(AND IN_SiamSquareShopping:ABC (AND LOW_BALANCE:ABC ACTIVE:ABC))","resultType" : "count", "executionMode" : "async"}' -H "Content-Type: application/json" -X POST http://13.76.166.53:8111/api/v1/query
{"status":"success","query":"( AND IN_SiamSquareShopping:ABC (AND LOW_BALANCE:ABC ACTIVE:ABC))","processingTime(ms)":5,"result":29}

I got the above results. 29 of my 100 active customers are currently having a low balance and in the geofence I am tracking. If you want to find out who they are, you may change the resultType to “reducedValues” and you would get their anonymous identifiers. Isn’t that insanely powerful? In reality, the results would be tens of millions and the API support pagination to fetch all the results.

And you can quickly launch an engagement (campaigns) in Sift for them in a matter of few minutes by using Sift Portal. Kuldeep shares a very exciting blog on how you can use Sift’s beautiful analytical visualisers to manipulate these audiences and launch engagements.

For those DQL lovers, here is the BNF grammar version of the Sift’s Audience Management DQL

GraphQuery -> SetOperationQuery
GraphQuery -> ApplyOperationQuery
GraphQuery -> FilterOperationQuery
GraphQuery -> SchemaQuery
GraphQuery -> EdgeQuery
 
SetOperationQuery -> ( SET_OPERATION GraphQuery GraphQuery )
ApplyOperationQuery -> ( APPLY_OPERATION EDGE GraphQuery )
FilterOperationQuery -> ( FILTER_OPERATION pattern GraphQuery )
SchemaQuery -> \w+?\w+
EdgeQuery -> \w+:\w+
 
EDGE  -> \w+:
SET_OPERATION -> AND|OR|DIFFERENCE
APPLY_OPERATION -> APPLY
FILTER_OPERATION -> FILTER
pattern -> \w+=\w+

In the next part, how we an use Sift’s DQL to query for audience with relationships. It excites me every time I talk about it.

Share This Post
Have your say!
5 0
1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>