hero2-desktop.webp

Introduction to Big Data Hadoop (Cloudera)

Începător

Introduction to Big Data Hadoop (Cloudera)

Durată: 3 zile

Certificare: Diploma de participare

Cui îi este dedicat cursul?

The course is recommended to all who want to have an understanding of Big Data through Apache Hadoop architecture and different commercial distributions.

Cunoștințe și abilități inițiale
  • Knowledge of SQL/noSQL difference, Distributed systems/networks concepts, Network concepts
Prezentarea cursului

The scope of this course is to provide an understanding of Big Data through Apache Hadoop architecture and discuss its different commercial distributions.

Ce subiecte abordează cursul
  1. Intro in Big Data
    • What is Big Data, common factors of big data, use cases
    • Big Data Architecture evolution
    • Lambda and Kappa architecture overview
  2. Hadoop architecture/ecosystem intro (1.5 days)
    • What is Hadoop, HDFS emergence and MapReduce evolution
    • Use cases of Hadoop and Hadoop IRL
    • Hadoop framework overview, detailed overview and applicability cases :
      • HDFS & MapReduce essentials, YARN
      • HDFS concepts & components, redundancy,..
      • MapReduce architecture, mappers, reducers, combiners, partitioners
    • Apache Hive and Apache Impala intro and hands on exercises
    • Understanding role of file formats in Hadoop: Apache Avro, Parquet, ORC (hands on exercises)
  3. Data storage (0.5 days) 
    • Architecting data in Hadoop: storage options considerations, understand the difference between File Storage and noSQL databases 
    • Apache Hbase intro and hands on exercises (using Hive) 
  4. Data computing & Data Analysis (0.5 days) 
    • Apache Spark intro 
    • MapReduce and Apache Spark 
    • Start working with Spark : create a dataframe from a HDFS file, transformations & actions, operations on dataframes with Scala and SQL (Spark SQL) - hands on exercises; 
    • Data analysis: SQL on Hadoop - Hive, Impala, Spark SQL 
  5. Data ingestion (0.5 days)
    • What solutions exist in the ingestion layer
    • What messaging bus solutions exist and the main differences  
    • Apache Kafka intro  
  • Resource management : Mesos, YARN  

Commercial distributions of Hadoop that will be referenced throughout the course: Cloudera , Hortonworks generic considerations.
Important: We will work with Cloudera commercial distribution throughout the course (CDH).

Ce abilități se dobândesc în urmă cursului
  • understand the Big Data Hadoop ecosystem, applicability and scope of the different components of the Hadoop ecosystem 
  • understand the general differences between its main commercial distributions

Course Requirements:

  • We will need open Internet connection throughout the course, we will run the exercises on cloud - thus is mandatory for the Internet connection to be open and reliable;
  • Each participant need to have it’s own computer in order to run the hands on exercises, also the computer settings has to allow access to Google docs and Github for getting access to presenters slides, documents, data and exercises; 
  • Each computer needs to have a Google Chrome browser and an SSH client to connect to the cloud environment; 
  • For the online version of the course two screens are recommended, so that the student can follow the trainer and as well complete the hands-on exercises.

Nu ai găsit ce căutai? Dă-ne un mesaj!

Prin trimiterea acestui formular sunteți de acord cu termenii și condițiile noastre și cu Politica noastră de confidențialitate, care explică modul în care putem colecta, folosi și dezvălui informațiile dumneavoastră personale, inclusiv către terți.