Apache Spark and Scala Tutorials

Ratings:
(4.5)
Views:472
Banner-Img
  • Share this blog:

Welcome to Apache Spark and Scala Tutorial. The objective of these tutorials is to provide in depth understand of Apache Spark and Scala.

In addition to free Apache Spark and Scala Tutorial, we will cover common interview questions, issues and how to’s of Apache Spark and Scala.

Introduction

Spark is an open source project that has been built and is maintained by a thriving and diverse community of developers. Spark started in 2009 as a research project in the UC Berkeley RAD Lab, later to become the AMPLab. It was observed that MapReduce was inefficient for some iterative and interactive computing jobs, and Spark was designed in response. Spark’s aim is to be fast for interactive queries and iterative algorithms, bringing support for in-memory storage and efficient fault recovery. Iterative algorithms have always been hard for MapReduce, requiring multiple passes over the same data.

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming. Apart from supporting all these workload in a respective system, it reduces the management burden of maintaining separate tools.

Desired to gain proficiency on Apache Spark and Scala? Explore the
 blog post on Apache Spark and Scala Training to become a pro in 
Apache Spark and Scala.

Features of Apache Spark

-Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory, and 10 times faster when running on disk. This is possible by reducing number of read/write operations to disk. It stores the intermediate processing data in memory.

-Supports multiple languagesSpark provides built-in APIs in Java, Scala, or Python. Therefore, you can write applications in different languages. Spark comes up with 80 high-level operators for interactive querying.

-Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.

Components of Spark

Apache Spark Core

Spark Core is the underlying general execution engine for spark platform that all other functionality is built upon. It provides In-Memory computing and referencing datasets in external storage systems.

Spark SQL

Spark SQL is a component on top of Spark Core that introduces a new data abstraction called SchemaRDD, which provides support for structured and semi-structured data.

Spark Streaming

Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming analytics. It ingests data in mini-batches and performs RDD (Resilient Distributed Datasets) transformations on those mini-batches of data.

MLlib (Machine Learning Library)

MLlib is a distributed machine learning framework above Spark because of the distributed memory-based Spark architecture. It is, according to benchmarks, done by the MLlib developers against the Alternating Least Squares (ALS) implementations. Spark MLlib is nine times as fast as the Hadoop disk-based version of Apache Mahout (before Mahout gained a Spark interface).

GraphX

GraphX is a distributed graph-processing framework on top of Spark. It provides an API for expressing graph computation that can model the user-defined graphs by using Pregel abstraction API. It also provides an optimized runtime for this abstraction.

Learn more about Apache Spark and Scala Interview Questions in this 
blog post.

Scala Introduction

Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It integrates very well with the Java platform. Our Scala DSL implementation is built on top of the existing Java-based DSL, but it adds Scala syntax sugar to make building routes even more convenient.

Scala stands for scalable language. It is a modern multi paradigm programming language. It combines functional and object oriented programming. Object oriented make it simple to design complex systems and to adapt them to new demands and functional programming make it simple to create things rapidly from simple parts and it is compatible with Java.

It adopts a big part of the syntax of Java and C. Except syntax Scala takes other elements of Java like as its basic types, class libraries and its execution model. It is designed to convey general programming patterns in an elegant, brief & type-safe way.

There are the two feature of Scala.

-Object Oriented

-Functional

Scala Notation

‘_’ is the default value or wild card ‘=>’ Is used to separate match expression from block to be evaluated The anonymous function ‘(x,y) => x+y’ can be replaced by ‘_+_’ The ‘v=>v.Method’ can be replaced by ‘_.Method’ "->" is the tuple delimiter Iteration with for: for (i <- 0 until 10) { // with 0 to 10, 10 is included println(s"Item: $i") } Examples: import scala.collection.immutable._ lsts.filter(v=>v.length>2) is the same as lsts.filter(_.length>2) (2, 3) is equal to 2 -> 3 2 -> (3 -> 4) == (2,(3,4)) 2 -> 3 -> 4 == ((2,3),4)

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.


Stay Updated


Get stories of change makers and innovators from the startup ecosystem in your inbox