What is real time analytics?
analytics is defined as the utilization or the capacity to utilize analytical
data or resources as soon as the data enters into the system or it can also be
defined as type of big data analytics for which the data has to be analyzed and
processed as soon as it arrives. Reporting and dynamic analysis can be done
with this data within less than sixty seconds from the data entering into the
system. Batch processing techniques like Hadoop provide higher throughput,
whereas real time technologies such as S4 and storm can process dynamic data
looking for big data were ready for putting data onto work, which included ways
for effectively analyzing data from various sources in real time or near real time. To be able for doing this at this
scale and at this speed can help get an organization freedom to react to life
and carry necessary adjustments for improving business while opportunities
still being available.
The joint effect of data service technologies and data
processing tools has made this thing possible among large workloads and used
data analytics can be categorized into two types: –
Ø On Demand- In this type of real time
data analytics, unless a requestor requests a query, the analytics is not
delivered. So, when an employee wants to check the status of an application, it
can be counted as on demand analytics. One example for on demand analytics can
be when a web analyst wants to check web traffic on a site to protect the site
Ø Continuous- This type analytics
approach is more proactive as it can help employees to be updated regarding
regular updates continuously. Dynamic business intelligence can also be termed
as continuous real-time data analytics. An employee monitoring a graph for site
usage on a site can be an example of continuous real-time data analytics.
data analytics has gained importance and has become a necessity in industries
because of the necessity to deal with rapid increase in data and the need to
proactively response to triggering data.
Use cases of stream processing or
real-time data analytics are as follows: –
Telecommunication: – CDR processing, Social analysis,
churn prediction, remapping is possible just because od real time data
Transportation: – Intelligent traffic management
and automotive telematics are possible because of real time data analytics.
Energy and utilities: – Transactive control, Phasor
measurement unit and downhole sensor monitoring are applications of real time
Health and life sciences: – ICU monitoring, epidemic warning
system and remote healthcare monitoring are applications of real time data
Natural systems: – Wildlife management and water
management are the applications of real time data analytics.
Law enforcement, defense and cyber
security: – Real
time multimodal surveillance, situational awareness and cyber security
detection are the applications of real time data analytics.
Stock market: – Impact of weather on security prices, Market
data analysis at low latency and momentum calculator would not have been
possible without real time data analytics.
Fraud Prevention: – Multi-party fraud detection and real-time
fraud prevention are applications of real time data analytics.
eScience: – Space weather prediction,
transient event detection, synchrotron atomic research and genomic research are
the applications of real time data analytics.
Data Stream Processing platforms for
real time data analytics
computing large volume of high velocity data Storm can be used. A distributed
real-time computation system helps to process unbounded data streams and has a
simple processing model.
combination of spout, a bolt and a topology a set of high level abstraction is
provided in storm which helped in developing real time applications over storm.
Joins , filters , grouping , aggregations and functions along with incremental
processing over any database are provided by storm.
Apache helps in providing low latency platform, high throughput for real time
data feeds. 100 of MB’s of reads and writes per second coming from thousands of
clients can be handled by one KAFKA broker. Data streams are spread and
partitioned over several machines to obtain high availability and horizontal
scalability. For coordination of processing nodes Kafka depends on zookeeper.
Application with low latency, high scalability and high availability KAFKA can
Flume is a
distributed, available and reliable service for collecting, moving and
aggregating large amount of log data. Based on streaming data flows it has
simple architecture. With the presence of reliability mechanism, recovery
mechanism and failover, flume is fault tolerant and robust. It allows online
analytical application as it has simple extensive model. For simple event
processing and to support data ingestion FLUME is best suited. But for CEP
applications KAFKA is better suited than FLUME. But many applications are using
the combination of FLUME and KAFKA for best results.
time data processing on cloud infrastructure AMAZON kinesis is used. From
thousands of sources it can store TB’s of data within an hour. Kinesis can
integrate with storm as it can provide kinesis storm spout which can fetch data
from kinesis stream and can emit as a tuple. The combination of the two can
provide scalable and reliable stream capture, replay service and storage.
storage infrastructure optionsTraditional On-Premise Storage versus
Cloud StorageIn 2016, 83%
of the industries spent on traditional on-premise storage, whereas 17% spent on
cloud infrastructure. Whereas in 2017, 58% industries spent on premise
infrastructure, whereas 42% industries spent on cloud infrastructure.On-premise Hadoop
above on-premise Hadoop, we can see that HDFS is the file system whereas SPARK,
TEZ, TACHYON and YARN are the various platforms on HDFS. Spark, spark
streaming, spark SQL, hive, map-reduce, pig, mahout, HBase, storm and graph are
the various applications running on these platforms. HDFS has heterogeneous
storage types and also provides end to end encryption but can operate in a
single data center only, also disaster recovery is not solved in HDFS.
Cloud infrastructure for Big dataThe adoption
of cloud infrastructure has become a necessity today as applications are moving
from on-premise centers to lower the cost and get agility.To compute,
AWS major benefit is its EC2 instances, which can be joined with various options.
Elastic Beanstalk, EC2 container service, AWS lambda and auto scaling are other
services provided by AWS. While to compute, Azure provides virtual
machines(VM’s) having tools like cloud services and resource manager which
helps in deploying applications on cloud and the auto scaling service. Scalable
compute engines from google provides VM’s in Google’s data centers. Highly
customizable, consistent in performance and with persistent disk storage these
are quick to biotas, Amazon and google cloud support RDB as well as No-SQL
Databases with Azure DocumentDB, Amazon DynamoDB and Google Bigtable.S3,
Elastics book storage, Elastics file system are all included in AWS. For azure,
Azure block storage, along with table, queue and file storage are present. Azure
backup, import export and site recovery are al a part of Azure.AWS offers a
better breadth and depth. It ranks better on configuration options, monitoring
and policy features, security and reliability. Also, its openness and
flexibility add to its advantages. However, absence of hybrid cloud strategy
and complexity of AWS are some of the shortcomings of AWS. Companies where
Microsoft has already a good hold, Microsoft can easily help those companies to
transition to cloud. Azure works well with the hybrid approach as well. Through
Azure and AWS both have PaaS capabilities, Microsoft is a bit better than
Amazon. As Amazon provides you the options for many supporting platforms, with
Azure running something other than windows server can cause problem. Google
cloud has been good in open source community but has not yet made its mark at
the industry level.Thus, AWS
maintains its lead to provide wide range of functionality and maturity as
compared to the rest of the two platforms. Enterprise friendly nature with wide
range of tools and services makes it a better choice for the organizations. But
Microsoft is not far behind, it has started closing the gap between AWS and
also the companies who have invested heavily on Microsoft and its products
would opt for AZURE as compared to AWS. But then comes Google which slowly
making progress with small range of customers but has a lot of work to do. Future research challenges
major challenge is, how can we apply best analytical techniques like machine
learning and statistics to streaming data. Next big challenge the lack of standardized
format in doing interoperation for systems over different layers of real-time
analytic stacks. Another
challenge is developing easy to understand solutions which reflects present
analysis ways and methods application fronts.