MOC 20775: Performing Data Engineering on Microsoft HD Insight kursus

Det lærer du

Dette kursus i Big Data er rettet mod dig, der ønsker at lære at designe, planlægge og implementere Big Data løsninger med Microsoft HD Insight. Du vil på kurset bl.a. lære at oprette HD Insight Clusters og anvende Spark og Stream Analytics.

Efter kurset vil du være i stand til at:

  • Oprette HD Insight Clusters
  • Implementere sikkerhed og adgang til ressourcer
  • Kopiere data til HDInsight
  • Implementere batch løsninger
  • Designe ETL med Spark
  • Analysere data med Spark SQL
  • Analysere data med Hive og Phoenix
  • Anvende Stream Analytics
  • Udvikle Big Data realtidsløsninger med Apache Storm

Det får du

Før kurset

Mulighed for at tale med en instruktør, der kan hjælpe dig med at finde det helt rigtige kursus.

På kurset

Undervisning af Danmarks mest erfarne instruktørteam i hyggelige og fuldt opdaterede kursuslokaler i centrum af København.

Et kursus bestående af en vekslen mellem teori og praktiske øvelser. Vi ved, hvor vigtigt det er, at du får tid til at arbejde med opgaverne i praksis, og derfor har vi altid fokus på hands-on i undervisningen.

Adgang til Microsofts digitale kursusmateriale (DMOC) samt Microsoft Labs Online.*

Fuld forplejning, som inkluderer morgenmad, friskbrygget kaffe, te, frugt, sodavand, frokost på en italiensk restaurant på Gråbrødretorv, kage, slik, og naturligvis Wi-Fi til dine devices.

Et kursuscertifikat med bevis på dine nye kvalifikationer.

Efter kurset

Adgang til vores gratis hotline, som betyder, at du op til et år efter kurset kan ringe eller skrive til os, hvis du har spørgsmål til de emner, der er blevet gennemgået på kurset.

Vores unikke tilfredshedsgaranti, som er din tryghed for at få fuldt udbytte af dit kursus.

  • Kurset bliver afholdt på dansk, men vi benytter Microsofts digitale materiale (DMOC), som er på engelsk. På kurset bliver der stillet en Surface tablet til rådighed, som kan anvendes til læsning af materialet. Du vil efterfølgende have adgang til materialet både online og lokalt. I tilfælde af at Microsoft laver en ny version af kursusmaterialet, vil du automatisk få adgang til det. Derudover vil du have adgang til øvelser via Microsoft Online Labs i 180 dage i alt, og du kan derfor fortsætte eller starte forfra på en øvelse hjemmefra, under eller efter kurset, alt efter behov.
  • Få det optimale ud af kurset

    Få det optimale ud af kurset

    Dette Big Data kursus indgår som en del af vores samlede udbud af Business Intelligence kurser og forudsætter erfaring med R, R pakker og dataanalyser generelt.

    Kursusindhold

    Module 1: Getting Started with HDInsight

    This module introduces Hadoop, the MapReduce paradigm, and HDInsight.

    Lessons

    • Big Data
    • Hadoop
    • MapReduce
    • HDInsight

    Lab : Querying Big Data

    • Query data with Hive
    • Visualize data with Excel

    After completing this module, students will be able to:

    • Describe Big data.
    • Describe Hadoop.
    • Describe MapReduce.
    • Describe HDInsight.

    Module 2: Deploying HDInsight Clusters

    At the end of this module the student will be able to deploy HDInsight clusters.

    Lessons

    • HDInsight cluster types
    • Managing HDInsight Clusters
    • Managing HDInsight Clusters with PowerShell

    Lab : Managing HDInsight clusters with the Azure Portal

    • Create an HDInsight Hadoop Cluster
    • Customise HDInsight using a script action
    • Customize HDInsight using Bootstrap
    • Delete an HDInsight cluster

    After completing this module, students will be able to:

    • Describe HDInsight cluster types.
    • Describe the creation, management, and deletion of HDInsight clusters with the Azure portal.
    • Describe the creation, management, and deletion of HDInsight clusters with PowerShell.

    Module 3: Authorizing Users to Access Resources

    This module covers permissions and the assignment of permissions.

    Lessons

    • Non-domain Joined clusters
    • Configuring domain-joined HDInsight clusters
    • Manage domain-joined HDInsight clusters

    Lab : Authorizing Users to Access Resources

    • Configure a domain-joined HDInsight cluster
    • Configure Hive policies

    After completing this module, students will be able to:

    • Describe how to authorize user access to objects.
    • Describe how to authorize users to execute code.
    • Describe how to manage domain-joined HDInsight clusters.

    Module 4: Loading data into HDInsight

    This module covers loading data into HDInsight.

    Lessons

    • HDInsight Storage
    • Data loading tools
    • Performance and reliability

    Lab : Loading Data into HDInsight

    • Loading data using Sqoop
    • Loading data using AZcopy
    • Loading data using ADLcopy
    • Use HDInsight to compress data

    After completing this module, students will be able to:

    • Describe HDInsight storage configurations and architectures.
    • Describe options for loading data into HDInsight.
    • Describe benefits of compression and pre-processing in HDInsight.

    Module 5: Troubleshooting HDInsight

    This module describes how to troubleshoot HDInsight.

    Lessons

    • Analyze HDInsight logs
    • YARN logs
    • Heap dumps
    • Operations management suite

    Lab : Troubleshooting HDInsight

    • Analyze HDInsight logs
    • Analyze YARN logs
    • Monitor resources with Operations Management Suite

    After completing this module, students will be able to:

    • Analyze HDInsight logs.
    • Analyze YARN logs.
    • Analyze Heap dumps.
    • Use the operations management suite to monitor resources.

    Module 6: Implementing Batch Solutions

    This module describes how to implement batch solutions.

    Lessons

    • Apache Hive storage
    • Querying with Hive and Pig
    • Operationalize HDInsight

    Lab : Backing Up SQL Server Databases

    • Load data into a hive table
    • Query data with Hive and Pig

    After completing this module, students will be able to:

    • Describe Apache Hive storage.
    • Query data using Hive and Pig.
    • Operationalize HDInsight.

    Module 7: Design Batch ETL solutions for big data with Spark

    This module describes how to design batch ETL solutions for big data with Spark.

    Lessons

    • What is Spark?
    • ETL with Spark
    • Spark performance

    Lab : Design Batch ETL solutions for big data with Spark.

    • Create a HDInsight Cluster with access to Data Lake Store
    • Use HDInsight Spark cluster to analyze data in Data Lake Store
    • Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
    • Managing resources for Apache Spark cluster on Azure HDInsight

    After completing this module, students will be able to:

    • Describe Spark and when to use it.
    • Describe the use of ETL with Spark.
    • Analyze Spark performance.

    Module 8: Analyze Data with Spark SQL

    This module describes how to analyze data with Spark SQL.

    Lessons

    • Implement interactive queries
    • Perform exploratory data analysis

    Lab : Analyze data with Spark SQL

    • Implement interactive queries
    • Perform exploratory data analysis

    After completing this module, students will be able to:

    • Implement interactive queries.
    • Perform exploratory data analysis.

    Module 9: Analyze Data with Hive and Phoenix

    This module describes how to analyze data with Hive and Phoenix.

    Lessons

    • Implement interactive queries for big data with interactive hive.
    • Perform exploratory data analysis by using Hive
    • Perform interactive processing by using Apache Phoenix

    Lab : Analyze data with Hive and Phoenix

    • Implement interactive queries for big data with interactive Hive
    • Perform exploratory data analysis by using Hive
    • Perform interactive processing by using Apache Phoenix

    After completing this module, students will be able to:

    • Implement interactive queries with interactive Hive.
    • Perform exploratory data analysis using Hive.
    • Perform interactive processing by using Apache Phoenix.

    Module 10: Stream Analytics

    This module introduces Azure Stream Analytics.

    Lessons

    • Stream analytics
    • Process streaming data from stream analytics
    • Managing stream analytics jobs

    Lab : Implement Stream Analytics

    • Process streaming data with stream analytics
    • Managing stream analytics jobs

    After completing this module, students will be able to:

    • Describe stream analytics and it’s capabilities.
    • Process streaming data with stream analytics.
    • Manage stream analytics jobs.

    Module 11: Spark Streaming using the DStream API

    This module introduces the Dstream API and describes how to create Spark structured streaming applications.

    Lessons

    • Dstream
    • Create Spark structured streaming applications
    • Persistence and visualization

    Lab : Spark streaming applications using DStream API

    • Creating Spark streaming applications using the DStream API
    • Creating Spark structured streaming applications

    After completing this module, students will be able to:

    • Explain DStream.
    • Create Spark structured streaming applications.
    • Describe persistence and visualization.

    Module 12: Develop big data real-time processing solutions with Apache Storm

    This module explains how to develop big data real-time processing solutions with Apache Storm.

    Lessons

    • Persist long term data
    • Stream data with Storm
    • Create Storm topologies
    • Configure Apache Storm

    Lab : Developing big data real-time processing solutions with Apache Storm

    • Stream data with Storm
    • Create Storm Topologies

    After completing this module, students will be able to:

    • Persist long term data.
    • Stream data with Storm.
    • Create Storm topologies.
    • Configure Apache Storm.

    Module 13: Analyze Data with Spark SQL

    This module describes how to analyze data with Spark SQL.

    Lessons

    • Implement interactive queries
    • Perform exploratory data analysis

    Lab : Analyze data with Spark SQL

    • Implement interactive queries
    • Perform exploratory data analysis

    After completing this module, students will be able to:

    • Implement interactive queries.
    • Perform exploratory data analysis.

    Kontaktoplysninger

    Adresse
    Amagertorv 21
    1160 København K