Indicators can be compared with data structure like Struct or Class in which the information related to an entity is stored. In big data systems, there can be multiple indicators for millions of entities, so the design of the indicator gets very important as it can impact the performance and storage.
While designing the indicator, we must think about below points so that it can work optimally: –
- The Indicator structure should be very efficient while inserting the data
- Single element or range extraction should be fast
- Optimum utilization of resources
- Indicator should be self-managed and house-kept
Sift indicators are used to store the behaviour of the subscriber on the real-time basis. Keeping all the above points in mind, in Sift we have decided to use Json Object as base data structure. After handling huge amount of data being in telecom sector (for now), we could come up with some design patterns which are very useful and efficient.
Below are the various Sift Indicator Design Patters-
Simple Key-Value Indicators- These types of indicators are generally used to store primitive information such as name, age, gender, segment, last known location, identification number etc. It can be used in 2 ways as shown below: –
Key Array Series Indicators- These are complex type of indicators which keeps information daily level. It also has a date index access mechanism which helps in data extraction for range of dates and it also helps in performing optimal self-housekeeping which will eventually saves storage. The same thing can be implemented for weekly or monthly data.
For example, if we want to store the information related to daily expense made on groceries and mobile. We should be able query on the date range or exact date. We want to store only last 30 days on which expense happened. Then below will be the structure.
Note: The dates are shown in human readable format just to understand the concept better. The dates must be in epoch format to perform optimally.
In the above Json Object there are 2 types of data stored: –
- Sorted array of all days on which any expense happened.
- Actual expense amount for groceries and mobile on a date.
Having “keys” array has 2 benefits.
- It will allow to housekeep the data as soon as it goes over 30 elements. By using this, we can easily get the size of the array and as soon as size becomes 31, remove first element and corresponding actual expense entry from the indicator. This will help to forecast the sizing requirements.
- It will provide fast access to data on a date range. Let’s say if we need the expense happened in last 15 days. One way would be to just try to get last 15 days’ data one by one, whichever is available sum the expenses. But this would require a loop of 15 times, one for each day, which is not optimized. On other hand, we can iterate the sorted dates key array and take only those days which falls in the last 15 days’ range and break the loop as soon as we hit the last day of our concern. In this way, we will only iterate as many time as many days on which expense happened.
Having dates entry for actual expense value will help in single date fetch.
Activity Block Indicators- This is an improvisation on “Key Array Series Indicators”. This type of indicator will store detailed information on the activity after every instance of some other activity. For example, after every time a person goes outside the country what are the information about destination country, tickets expense, hotel expense, food expense, location he visited and date on which he visited these locations. So, this will be presented as below in this type of indicator: –
This type of indicator has all the advantages of Key Array Series type of indicators. On top of it, it saves on storage much more as we are storing multiple information in one structure. The other way of storing this many information is to create multiple Key Array Series type of indicator for each information, but that will require more storage as we will be repeating keys entry and the complete structure.
Functionally, it will provide the view of his activity every time he goes out of country, how much he spends on his trips on average, and the kind of places he liked to visit. Linking this information with 3rd party like “expedia.com” may help to better serve the customer or even provide offers to him.
Time Sensitive indicators- These are special type of indicators because it will only update on a time-based scenario. For example, an indicator should be update only once in a day or only by the latest record not by the old records. In these cases, we store the last update timestamp in the indicator body and use that timestamp to decide if this record will update the indicator or not. Below is the example of a subscriber mobile balance should be update only by the newer records not by older ones: –
If a record comes with the timestamp greater than the “lastestUpdateTimeStamp” then only it will be able to update the “Balance”. This will keep the mobile balance as the latest balance known by the application. If we don’t keep the timestamp, an older record can come and update the balance to an older value which will be wrong.
These indicator design patterns we came up with the experience after handling huge amount of data at different clients(Telecoms). The usage of a specific design will depend on the requirement of each indicator. This pretty much covers all the requirement as per our experience.