Data Export API¶

The data export api is responsible for providing a way of fetching raw data in the form of player events and sessions.

Using the API¶

Exporting data¶

The main way of using the API for data export is by providing either a customer or a customer/business unit and a time period. The API will then return a streamed response that can be consumed in any way the user sees fit.

There are five endpoints provided by the API for this, all accessed using a GET request (or a GET and POST request for the limited sessions endpoint):

/v1/export/customer/{customer_name}/events for raw event data corresponding to an entire customer.
/v1/export/customer/{customer_name}/business_unit/{business_unit}/events for raw event data corresponding to a specific business unit.
/v1/export/customer/{customer_name}/sessions for session summaries corresponding to an entire customer.
/v1/export/customer/{customer_name}/business_unit/{business_unit}/sessions for session summaries corresponding to a specific business unit.
/v1/export/customer/{customer_name}/business_unit/{business_unit}/sessions/limited for a limited session summaries, with some filter functionality.

In order to select the time period the start and end query parameters are provided, these are required in order to make a request and should be provided in ISO 8601 standard.

After making a request to one of the above endpoints the API will return the response as a data stream (for limited export there is also an option for a non-streaming response), and therefore the client fetching the data will need to consume it as a stream in order to avoid any potential memory problems. The data returned by the api will consist json objects separated by a new line. Do note that since the Analytics pipeline at Red Bee is constantly evolving to suit new needs the information in events and session summaries might changed over time. Due to this a exported record might contain several empty fields, this is usually an indcation that these fields are no longer ingested into the analytics database but kept for legacy reasons. In the same vein, due to the evolving nature of the analytics pipeline we do note guarantee any sort of backwards compatability in case the data structure changes.

Limited session export¶

Sometimes you might not be interested in a complete data dump and instead want to fetch individual sessions from a specific user or asset. In that case you can use the limited sessions export which comes with some basic filtering functionality. In this case you can use the limited sessions export functionality. When sending in filters a POST request should be used instead of a regular GET.

The export takes start and end query paramters in order to select a time period. In addition to this limit can also be provided (default is 10, maximum is 100) which tells the API how many records to return. The order of the records returned will be from newest to oldest. In order to filter for a specific user or asset id, one or more filters can be supplied as a request body.

By putting this together the following request can be made:

/v1/export/customer/FantasticTV/business_unit/FantasticContent/sessions/limited?start=2022-01-01&end=2022-02-01&limit=10
{
    "filters": {
        "user.meta.account_id" : ["user1"]
    }
}

This would fetch the last 10 sessions played for the user with the id user1.

The following filters are available when using the endpoint:

Name	Description
content.asset.id	String uniquely identifying the content in our system
user.meta.id	String uniquely identifying the user in our system
user.meta.account_id	String uniquely identifying the account in our system
viewing.meta.started	Flag to show only sessions that started playback, as in recieved a frame of video content

The repsonse returned from the limited export will be a json with all the sessions in a list

{
    "data" : [
        records-go-here
    ]
}

The api can also be set to return the data as a streaming response, this is done by including the query parameter as_json (default is True). If this is set to false then the response returned from the api will be sent as a stream of records instead.

Account session data¶

In order to fetch session data related to a specific account the following endpoint can be used:

/v1/export/customer/{customer_name}/business_unit/{business_unit}/account/{account_id}/sessions

The endpoint will return all sessions for a specific account and return user information for each session.

When fetching account session data the response is returned a streaming response and should be consumed as such for the client doing the request.

The data returned by the api will consist json objects separated by a new line and will contain the following fields.

Field	Description
time	The time (in UTC) when the session was created
account_id	The account id of the account playing the session
country	The country in which playback was registered
city	The city in which playback was registered
device_model	The model of the device doing playback
device_os	The OS of the device doing playback
device_os_version	The OS version of the device doing playback

Some of the fields might be null if the player sending the anlytics events didn't send the relevant information.

Data dumps¶

By using the API you can retrieve two types of data dumps. Raw Player Events and Session Summaries, these will be described below.

Player Events¶

These events are the raw events sent in from our player SDK, each event describes some sort of behaviour in the player during a playback session. More information about these events can be found here. In addition to this the events will contain some data added by the backend (usually denoted in lower case).

Session Summaries¶

Whenever a player starts playing some media (or attempts to start playing) a number of events will be created to describe the players behaviour during that playback, these are the events described above. Those events are then aggregated into a session summary, these session summaries are used as the basis for most of our analytics reports. A single session summary will describe a single playback session by a single user.

The session summaries contain the following fields:

Field	Type	Description	Legacy/Not used
AccountId	str	The account id of the account doing playback	no
ActivateAtPlay	bool	Indicator if the product offering connected to the asset was activated when playback started	no
AnalyticsBucket	int	Backend bucketing of analytics event	no
AnalyticsPostInterval	int	The time the player should buffer analytics events before sending them as a batch to the backend	no
AnalyticsTag	str	Backend tagging of analytics event	no
Anonymous	bool	Indicator if the playback was anonymous or not	no
AppType	str	Description of what type of app was used for playing the session	no
AssetId	str	Id of the asset used when playing the session	no
AssetRuntime	int	The length of the asset as reported by the player if available	yes
AssetSubtype	str	The subtype of the asset that was played	yes
AssetTitle	str	The human readable name of the asset	yes
AutoPlay	bool	Indicator if the session was started due to autoplay triggering the session start	no
AverageBitrate	double	The average bitrate across the session. This is a weighted average so the longer a session stays in a specific bitrate the heavier that bitrate will impact the average	no
Browser	str	The manufacturer/developer of the browser used when playing the session	no
BrowserFamily	str	The browser family of browser where playback was started	no
BufferingEvents	int	Number of events which indicated that buffering was started for the session	no
BufferingRatio	double	The ratio between time spent buffering and the time not spent buffering. I.e. a session that is 10m which spent 5m buffering and 5m playing would have a ratio of 1.0 (100%), a value over 100% would then indicate that the user spent more time buffering than watching.	no
BufferingRatioOfTotalDuration	double	The ratio between time spent buffering and the total session duration. I.e. a session that is 10m which spent 5m buffering and 5m playing would have a ratio of 0.5 (50%)	no
BusinessUnit	str	The RedBee business unit to which the session belongs	no
CDNVendor	str	The CDN vendor used when streaming the session	no
ChannelId	str	Id of the channel that was played	no
ChannelName	str	The human readable name of the channel	yes
City	str	The city where playback was initiated	no
ConnectionType	str	The connection type used when playing the session	yes
ContainedErrors	bool	Indicator if the session contained any known errors	no
Country	str	The country where playback was initiated	no
Customer	str	The RedBee customer name to which the session belongs	no
DeviceId	str	The unique id of the device where playback occurred	no
DeviceModel	str	The model name of the device used for playing the session	no
DeviceType	str	The device type used when playing the session	yes
DisplayTitle	str	The human-readable title of the asset	yes
DrmCertificateRequests	int	Number of DRM certificate requests sent by the during the session (usually one, but there can be multiple)	no
DrmCertificateResponses	str	Number of DRM certificate responses sent by the during the session (usually one, but there can be multiple)	no
DrmLicenceRequests	int	Number of DRM licence requests sent by the during the session (usually one, but there can be multiple)	no
DrmLicenceResponses	int	Number of DRM licence responses received by the during the session (usually one, but there can be multiple)	no
DRMType	str	The type of DRM used on the asset (Widewine, FairPlay etc)	no
EntitlementId	str	The id assigned the session by the entitlement call	no
EntitlementType	str	The type of the entitlement request made by the player	yes
ErrorCode	str	The error code reported by the player in case of error	no
ErrorMessage	str	The error message encountered by the player in case of error	no
EventCount	int	Number of raw player events that the was processed for the session	no
ExternalProductId	str	The external product id of the product that the played asset belongs to	yes
FirstEventTime	int	The epoch time of the first event for the entire session	no
Format	str	The playback format used during the session	yes
Height	int	The screen height used on the device playing the session	no
IncompleteBufferingEvents	int	Number of buffering events that were incompleted (i.e. either missing buffering started or buffering ended)	no
Intent	str	The intent (PLAY/DOWNLOAD) of the player when requesting playback	no
LastEvent	str	The last event of the session	no
LastEventTime	int	Epoch timestamp of the last event sent by the player	no
LastPlayedOffset	int	The last played offset time (the position of the player in the stream)	no
MandatoryValidationFailed	bool	Indicator if the session failed the optional analytics pipe validation checks	no
Manufacturer	str	The company manufacturing the device used to play the session	no
MaxBitrate	double	Maximum bitrate value achieved during the session	no
MediaLocator	str	URL of the media locator used for playback	no
MinBitrate	double	The minimum bitrate played during the session	no
Model	str	The asset model used for the played asset	yes
MostUsedBitrate	double	The bitrate that was used the most (i.e. the longest time) during the session	no
Name	str	The name of the layout engine used if a browser was used for playback	no
OptionalValidationFailed	bool	Indicator if the session failed the optional analytics pipe validation checks	no
OS	str	The OS used by the device that played the session	no
OSVersion	str	The version number of the OS used on the device where the session was played	no
PackageId	str	Id of Red Bee program package used	yes
PercentagePlayed	double	Percentage of the asset that was played if the length of the asset was known to the player.	no
Player	str	The name of the player SDK used for playing the session	no
PlayerVersion	str	The version number of the player SDK used for playing the session	no
PlayMode	str	The type of asset that is being played (vod, live etc)	yes
PlayType	str	The type of asset that is being played (vod, live etc)	no
ProductId	str	The id of the product to which the asset that was played belongs	no
ProgramId	str	The id of the last program that was played during the session	no
ProgramName	str	The human-readable title of the program	yes
PublicationId	str	Id of the publication to which the asset belongs	no
Resolution	str	The screen resolution used on the device playing the session	no
RootedOrJailbroken	bool	Indicator if the device was rooted or jailbroken	no
SessionCompleted	bool	Indicator if the session ended with either a playback completed/aborted event or an error	no
SessionId	str	The unique id of the session	no
SessionStarted	bool	Indicator if playback was started (i.e. if the session received the proper started event)	no
SessionToken	str	The session token reported by the player	no
StartupDurationMs	double	Number of ms from session initialization to playback starting	no
SupportedDevice	bool	Indicator iif the device is supported by the player framework	no
TimeSpentBufferingMs	str	The number of milliseconds the session spent buffering in total	no
TotalAdCompleted	int	Number of events that indicated that an ad had completed playing during the session	no
TotalAdFailed	int	Number of events that indicated that an ad had failed playing during the session	no
TotalAdStarted	str	Number of events that indicated that an ad had started playing during the session	no
TotalUsageMegabyte	double	Experimental calculation on how many MBs was consumed during session playback	no
Type	str	The device type reported by the player	yes
UserAgent	str	The user agent string as reported by the device playing the session	no
UserId	str	The unique id for the user doing playback	no
VideoLength	int	The length of the asset as reported by the backend if available	no
ViewingDurationMs	int	The total number of milliseconds that the player was playing content (i.e. not in a paused state)	no
Width	int	The screen width used on the device playing the session	no