HDP Analyst Data Science is unfortunately unavailable

Thankfully we have 3 other Hadoop Classes for you to choose from. Check our top choices below or see all classes for more options.

HDP Analyst Data Science

at Sunset Learning Institute - Yerba Buena

Course Details
Start Date:

This class isn't on the schedule at the moment, but save it to your Wish List to find out when it comes back!
If you're enrolled in an upcoming date, this simply means that date has now sold out.

Yerba Buena
33 New Montgomery St Ste 300
Btwn Market & Stevenson Streets
San Francisco, California 94105
Purchase Options
Save to WishList

1 person saved this class

Book Private Class
Class Level: All levels
Age Requirements: 18 and older
Average Class Size: 8

What you'll learn in this hadoop training:

This course is designed for students preparing to become familiar with the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, Pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Target Audience

Computer science and data analytics students who need to apply data science and machine learning on Hadoop.


Students must have experience with at least on programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

Course Objectives

By the end of class, students will be able to:
  • Recognize use cases for data science 
  • Describe the architecture of Hadoop and YARN 
  • Describe supervised and unsupervised learning differences
  • List the six machine learning tasks 
  • Use Mahout to run a machine learning algorithm on 
  • Hadoop 
  • Use Pig to transform and prepare data on Hadoop 
  • Write a Python script 
  • Use NumPy to analyze big data 
  • Use the data structure classes in the pandas library 
  • Write a Python script that invokes SciPy machine learning
  • Describe options for running Python code on a Hadoop cluster 
  • Write a Pig User-Defined Function in Python 
  • Use Pig streaming on Hadoop with a Python script 
  • Write a Python script that invokes scikit-learn 
  • Use the k-nearest neighbor algorithm to predict values 
  • Run a machine learning algorithm on a distributed data set 
  • Describe use cases for Natural Language Processing (NLP)
  • Perform sentence segmentation on a large body of text 
  • Perform part-of-speech tagging 
  • Use the Natural Language Toolkit (NLTK) 
  • Describe the components of a Spark application 
  • Write a Spark application in Python 
  • Run machine learning algorithms using Spark MLlib

School Notes:
All Sunset Learning Institute classes are taught in an instructor-led, live virtual environment. This price includes a facility fee of $300 (except Reston,VA or Denver, CO) to allow you to take it in a classroom-type environment. If you prefer to take the class from home, the $300 fee will be waived and refunded.

Still have questions? Ask the community.

Refund Policy
Refund is 2 weeks prior to class start date.


Google Map

Sunset Learning Institute

All classes at this location

Start Dates (0)

This class isn't on the schedule at the moment, but save it to your Wish List to find out when it comes back!

Similar Classes

Benefits of Booking Through CourseHorse

Booking is safe. When you book with us your details are protected by a secure connection.
Lowest price guaranteed. Classes on CourseHorse are never marked up.
This class will earn you 22950 points. Points give you money off your next class!
Questions about this class?
Get help now from a knowledge expert!
Questions & Answers (0)

Get quick answers from CourseHorse and past students.

Reviews of Classes at Sunset Learning Institute (4)

School: Sunset Learning Institute

Sunset Learning Institute

Sunset Learning Institute exists to provide world-class IT training by delivering exceptional experiences to our customers. Our goal is to help clients optimize their Cisco hardware investment through a consultative approach that allows us to deliver the highest quality advanced training in the marketplace,...

Read more about Sunset Learning Institute

CourseHorse Approved

This school has been carefully vetted by CourseHorse and is a verified SF educator.

Want to take this class?

Save to Wish List
Booking this class for a group? Find great private group events here

3 Top Choices

Big Data: Hadoop and Spark

This class is temporarily being offered remotely.

at GreyCampus - Online Online, New York, New York 00000

GreyCampus Big Data Hadoop & Spark training course is designed by industry experts and gives in-depth knowledge in big data framework using Hadoop tools (like HDFS, YARN, among others) and Spark software. This online instructor-led course is a stepping stone for the learners who are willing to work on various big data projects. What You Get ...

Saturday Jun 27th, 10am - 1pm Eastern Time

  (7 sessions)

7 sessions

Big Data with Amazon Cloud, Hadoop/Spark and Docker

This class is temporarily being offered remotely.

at NYC Data Science Academy - Virtual Classroom

This is a 6-week evening program providing a hands-on introduction to the Hadoop and Spark ecosystem of Big Data technologies. The course will cover these key components of Apache Hadoop: HDFS, MapReduce with streaming, Hive, and Spark. Programming will be done in Python. The course will begin with a review of Python concepts needed for our examples....

Tuesday Jun 23rd, 7pm - 9:30pm Eastern Time

  (12 sessions)

12 sessions

Data Science Immersive

This class is temporarily being offered remotely.

at General Assembly - Penn Quarter 509 7th St NW 3rd Fl, Washington, District of Columbia 20004

This is a full time course. A Well-Rounded Technical Foundation Get hands-on training with the essentials of data science: data mining, statistical modeling, machine learning, and the Python programming language. Apply advanced techniques such as recommender systems, neural networks, and computer vision models to power business forecasts and drive...

Monday Jun 15th, 10am - 6pm Eastern Time

  (62 sessions)

62 sessions