By Padma Priya Chitturi
- Use Apache Spark for info processing with those hands-on recipes
- Implement end-to-end, large-scale facts research higher than ever before
- Work with robust libraries akin to MLLib, SciPy, NumPy, and Pandas to achieve insights out of your data
Spark has emerged because the such a lot promising massive info analytics engine for facts technology pros. the real energy and price of Apache Spark lies in its skill to execute facts technology projects with velocity and accuracy. Spark's promoting aspect is that it combines ETL, batch analytics, real-time move research, desktop studying, graph processing, and visualizations. It allows you to take on the complexities that include uncooked unstructured information units with ease.
This advisor gets you cozy and assured appearing info technology projects with Spark. you are going to find out about implementations together with allotted deep studying, numerical computing, and scalable laptop studying. you'll be proven powerful ideas to complicated options in info technological know-how utilizing Spark's info technology libraries comparable to MLLib, Pandas, NumPy, SciPy, and extra. those uncomplicated and effective recipes will help you enforce algorithms and optimize your work.
What you are going to learn
- Explore the themes of knowledge mining, textual content mining, ordinary Language Processing, info retrieval, and desktop learning.
- Solve real-world analytical issues of huge info sets.
- Address info technology demanding situations with analytical instruments on a disbursed procedure like Spark (apt for iterative algorithms), which deals in-memory processing and extra flexibility for facts research at scale.
- Get hands-on adventure with algorithms like type, regression, and suggestion on genuine datasets utilizing Spark MLLib package.
- Learn approximately numerical and clinical computing utilizing NumPy and SciPy on Spark.
- Use Predictive version Markup Language (PMML) in Spark for statistical info mining models.
About the Author
Padma Priya Chitturi is Analytics Lead at Fractal Analytics Pvt Ltd and has over 5 years of expertise in substantial facts processing. at present, she is a part of strength improvement at Fractal and answerable for resolution improvement for analytical difficulties throughout a number of company domain names at huge scale. ahead of this, she labored for an airways product on a real-time processing platform serving a million person requests/sec at Amadeus software program Labs. She has labored on knowing large-scale deep networks (Jeffrey dean's paintings in Google mind) for snapshot type at the mammoth facts platform Spark. She works heavily with sizeable information applied sciences similar to Spark, typhoon, Cassandra and Hadoop. She was once an open resource contributor to Apache Storm.
Table of Contents
- Big information Analytics with Spark
- Tricky information with Spark
- Data research with Spark
- Clustering, class, and Regression
- Working with Spark MLlib
- NLP with Spark
- Working with gleaming Water - H2O
- Data Visualization with Spark
- Deep studying on Spark
- Working with SparkR
Read Online or Download Apache Spark for Data Science Cookbook PDF
Best data modeling & design books
This 3rd quantity of the best-selling "Data version source e-book" sequence revolutionizes the knowledge modeling self-discipline through answering the query "How are you able to store major time whereas bettering the standard of any form of info modeling attempt? " not like the 1st volumes, this new quantity makes a speciality of the elemental, underlying styles that have an effect on over 50 percentage of such a lot information modeling efforts.
HCI types, Theories, and Frameworks presents a radical pedagological survey of the technological know-how of Human-Computer interplay (HCI). HCI spans many disciplines and professions, together with anthropology, cognitive psychology, special effects, graphical layout, human components engineering, interplay layout, sociology, and software program engineering.
Modelling and Precision keep an eye on of platforms with Hysteresis covers the piezoelectric and different shrewdpermanent fabrics which are more and more hired as actuators in precision engineering, from scanning probe microscopes (SPMs) in lifestyles technology and nano-manufacturing, to precision lively optics in astronomy, together with house laser verbal exchange, area imaging cameras, and the micro-electro-mechanical platforms (MEMS).
This ebook makes a speciality of fresh study in smooth optimization and its implications up to the mark and information research. This ebook is a suite of papers from the convention “Optimization and Its functions up to speed and information technological know-how” devoted to Professor Boris T. Polyak, which was once held in Moscow, Russia on may possibly 13-15, 2015.
Additional resources for Apache Spark for Data Science Cookbook
Apache Spark for Data Science Cookbook by Padma Priya Chitturi