KT-WA2610 Machine Learning with Apache Spark Training Training - Minneapolis, Minnesota IT MN Technical Financial New York, NY
Knowledge Transfer Microsoft Certified Silver Training Partner CPLS
Knowledge Transfer is a Microsoft Certified Silver Learning Partner
Oracle University


Microsoft Certified Training Partner CTEC
Search for a Course Topic:
Public Courses
Corporate Services & Training



 Course Search
Course #

 Training Delivery
Training Delivery
Custom Curriculum
Course List
 Main Menu
View Courses
Site Index

Machine Learning with Apache Spark Training


To stay competitive, organizations have started adopting new approaches to data processing and analysis.  For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark’s distributed compute capability and its built-in machine learning library.

This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning.  This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.


  • Applied Data Science and Business Analytics
  • Machine Learning Algorithms, Techniques and Common Analytical Methods
  • Apache Spark Introduction
  • Spark’s MLlib Machine Learning Library

This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:

Lab 1 - Using the spark-submit Tool

Spark offers developers two ways of running your applications:

  • Using the spark-submit tool
  • Using Spark Shell

In this lab, we will review what is involved in using the spark-submit tool.

Lab 2 - The Apache Spark Shell

Interactive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).
The lab instructions below apply to the Scala version of the Spark Shell.

Lab 3 - Using Random Forests for Classification with Spark MLlib

In this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.
Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression.
In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.
We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.

Web Age Spark class can be delivered in traditional classroom style format. This Apache Spark Training can also be delivered in a synchronous instructor led format.


  • Data Scientists
  • Business Analysts
  • Software Developers
  • IT Architects

Click here to view the Course Outline

Participants should have the general knowledge of statistics and programming

1 Day  

View Printer Friendly Page


Course Schedule
  Start Date  City  Price  

To Inquire About Future Classes

Request a class date

if one is not scheduled.