Big Data with Amazon Cloud, Hadoop/Spark and Docker
- Beginner
- 18 and older
- $2,840.50
- NYC Data science Academy
- 30 hours over 12 sessions
Thankfully we have 14 other Big Data Classes for you to choose from. Check our top choices below or see all classes for more options.
NYC Career Centers - Virtually Online
Master the top data analytics tools and gain actionable insights through hands-on projects. Unlock your potential as a data analyst with this comprehensive bootcamp.
Apr 1st
10am–5pm EDT
Meets 8 Times
May 13th
10am–5pm EDT
Meets 8 Times
May 14th
6–9pm EDT
Meets 16 Times
Noble Desktop - Virtually Online
Uncover the power of Python for analyzing real-world data sets in this hands-on course at Noble Desktop. Explore Python fundamentals and learn how to create programs, work with data, visualize insights, and develop machine learning models. Elevate your data science skills with the Python for Data Science Bootcamp.
Mar 30th
10am–5pm EDT
Meets 5 Times
Apr 15th
10am–5pm EDT
Meets 5 Times
Apr 17th
6–9pm EDT
Meets 10 Times
May 20th
10am–5pm EDT
Meets 5 Times
This class has 6 more dates.
Tap here to see more
Noble Desktop - Virtually Online
Learn how to apply Python to analyze data, create predictive models using machine learning, and automate tasks in this comprehensive data science course. Gain the necessary programming skills to excel in entry-level data science and Python engineering positions.
Mar 30th
10am–5pm EDT
Meets 15 Times
Apr 17th
6–9pm EDT
Meets 30 Times
This class has 4 more dates.
Tap here to see more
Noble Desktop - Virtually Online
Learn to transform raw data into informative visuals with Tableau, the industry standard for creating charts, graphs, and maps. Master the art of data visualization and gain control over the look and feel of your creations, allowing you to present data in a visually stunning and meaningful way. Elevate your data analysis skills today!
Apr 10th
10am–5pm EDT
Meets 2 Times
May 9th
10am–5pm EDT
Meets 2 Times
Noble Desktop - Virtually Online
In this course, students will master Excel, SQL, and Tableau, some of the top data analytics tools. Here, students will gain the skills to organize, analyze, summarize, and visualize data, presenting actionable insights for effective decision-making. Comprehensive classroom training in Midtown Manhattan.
Apr 1st
10am–5pm EDT
Meets 8 Times
May 13th
10am–5pm EDT
Meets 8 Times
May 14th
6–9pm EDT
Meets 16 Times
This is a 6-week evening program providing a hands-on introduction to the Hadoop and Spark ecosystem of Big Data technologies. The course will cover these key components of Apache Hadoop: HDFS, MapReduce with streaming, Hive, and Spark. Programming will be done in Python. The course will begin with a review of Python concepts needed for our examples. The course format is interactive. Students will need to bring laptops to class. We will do our work on AWS (Amazon Web Services); instructions will be provided ahead of time on how to connect to AWS and obtain an account.
What is Hadoop?
Hadoop is a set of open-source programs running in computer clusters that simplify the handling of large amounts of data. Originally, Hadoop consisted of a distributed file system tuned for large data sets and an implementation of the MapReduce parallelism paradigm, but has expanded in many ways. It now includes database systems, languages for parallelism, libraries for machine learning, its own job scheduler, and much more. Furthermore, MapReduce is no longer the only parallelism framework; Spark is an increasingly popular alternative. In summary, Hadoop is a very popular and rapidly growing set of cluster computing solutions, which is becoming an essential tool for data scientists.
Prerequisites
To get the most out of the class, you need to be familiar with Linux file systems, Linux command line interface (CLI) and the basic linux commands such as cd, ls, cp, etc. You also need to have basic programming skills in Python, and are comfortable with functional programming style, for example, how to use map() function to split a list of strings into a nested list. Object oriented programming (OOP) in python is not required.
Syllabus
Unit 1: Introduction to Hadoop
1. Data Engineering Toolkits
2. Hadoop and MapReduce
Unit 2 – MapReduce
3. MapReduce using MRJob 1
4. MapReduce using MRJob 2
Unit 3 – Apache Hive
5. Apache Hive 1
6. Apache Hive 2
Unit 4 – Apache Pig
7. Apache Pig 1
8. Apache Pig 2
Unit 5 – Apache Spark and AWS
9. Apache Spark – Spark Core
10. Apache Spark – Spark SQL
11. Apache Spark – Spark ML
12. Amazon Elastic MapReduce
Project: Data Engineering Project
This course is available for "remote" learning and will be available to anyone with access to an internet device with a microphone (this includes most models of computers, tablets). Classes will take place with a "Live" instructor at the date/times listed below.
Upon registration, the instructor will send along additional information about how to log-on and participate in the class.
Any student wishing to withdraw from a program must notify CourseHorse in writing. The date of withdrawal for refund purposes is the last date of physical attendance. The failure of a student to notify us in writing of withdrawal may delay refund of tuition due pursuant to Section 5001 and 5002 of the Education Law.
Any student requesting cancellation within seven days after signing the Enrollment Agreement but before instruction begins will be refunded all money paid less 5% cancellation fee. Thereafter, in the event of cancellation or termination by the school, refunds will be prorated based on the student's last date of attendance.
Students who attended classes at NYC Data Science Academy on Big Data with Hadoop/Spark and Docker found the courses to be well-organized and informative. The curriculum covered a range of topics, including R and Python programming, statistical analysis, machine learning, and big data tools like Hadoop and Spark. The breadth of topics and difficulty level were suitable for those with academic backgrounds, and the courses prepared them well for further exploration in data science. The bootcamp was described as intense, both physically and mentally, but students found support from instructors, TAs, and classmates. The job assistance provided by the academy was also praised, with services including networking, resume editing, and mock interviews. Overall, students felt that attending NYC Data Science Academy was a valuable decision that provided them with the skills and confidence to pursue careers in data science. Quotes: 1. "The bootcamp courses weren't supposed to teach you everything, but they did prepare me very well if I wanted to explore further data science topics." 2. "I was able to see people applying data science tools to their expertise brilliantly, fashion, marketing, IT, health care... It was very helpful for me who was looking to step outside of academia." 3. "I appreciate the knowledge, skills, and support I acquired from NYC Data Science Academy. I highly recommend NYC Data Science Academy to anyone interested in this career."
People who viewed this class also viewed the following classes
Get quick answers from CourseHorse and past students.
NYC Data Science Academy is a program designed to teach those who wish to learn.
Through hands-on projects and real-world applications, our students develop the skills they will need to pursue data science as both a hobby and profession. We also organize the NYC Open Data Meetup, which means that by...
Read more about NYC Data Science Academy
This school has been carefully vetted by CourseHorse and is a verified Online educator.
Booking this class for a group? Find great private group events
Or see all Professional Group Events
Explore group events and team building activities ranging from cooking, art, escape rooms, trivia, and more.
More in IT
Get special date and rate options for your group. Submit the form below and we'll get back to you within 2 business hours with pricing and availability.