Learning Apache Spark 2
Packt Publishing
ISBN13:
9781785885136
$55.90
Learn about the fastest growing open source project in the world, and how it revolutionizes big data analyticsAbout This Book* Exclusive guide that covers how to get up and running with fast data processing using Apache Spark* Explore and exploit various possibilities with Apache Spark using real-world use cases in this book* Want to perform efficient data processing at real time? This book will be your one-stop solution.Who This Book Is ForThis guide appeals to Big Data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful.The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science and want to understand how Spark can help them on their analytics journey.What you will learn* Overview Big Data Analytics and its importance for organizations and data professionals.* Delve into Spark to see how it is different from existing processing platforms* Understand the intricacies of various file formats, and how to process them with Apache Spark.* Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager.* Learn the concepts of Spark SQL, SchemaRDD, Caching, Spark UDFs and working with Hive and Parquet file formats* Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark.* Introduce yourself to SparkR and walk through the details of data munging including selecting, aggregating and grouping data using R studio.* Walk through the importance of Graph computation and the graph processing systems available in the market* Check the real world example of Spark by building a recommendation engine with Spark using collaborative filtering* Use a telco data set, to predict customer churn using RegressionIn DetailSpark juggernaut keeps on rolling and getting more and more momentum each day. The core challenge are they key capabilities in Spark (Spark SQL, Spark Streaming, Spark ML, Spark R, Graph X) etc. Having understood the key capabilities, it is important to understand how Spark can be used, in terms of being installed as a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.Once we understand the individual components, we will take a couple of real life advanced analytics examples like: * Building a Recommendation system* Predicting customer churn The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.
- | Author: Asif Abbasi
- | Publisher: Packt Publishing
- | Publication Date: Mar 24, 2017
- | Number of Pages: 356 pages
- | Language: English
- | Binding: Paperback
- | ISBN-10: 1785885138
- | ISBN-13: 9781785885136
- Author:
- Asif Abbasi
- Publisher:
- Packt Publishing
- Publication Date:
- Mar 24, 2017
- Number of pages:
- 356 pages
- Language:
- English
- Binding:
- Paperback
- ISBN-10:
- 1785885138
- ISBN-13:
- 9781785885136