ONTAP Recipes: Easily create a Data Lake using ONTAP Storage

ONTAP Recipes: Did you know you can…?

Easily create a data lake using Apache Hadoop and ONTAP storage

The term “data lake” can be defined as a centralized store for enterprise data, including structured, semi-structured and unstructured data, used by multiple enterprise applications.

This recipe highlights the steps how to create a data lake with Apache Hadoop on ONTAP.

Determine hardware and network requirements using the Hortonworks cluster planning guide:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_cluster-planning/content/ch_hardware-recommendations_chapter.html

For evaluation purposes, a single server may be sufficient.

A two-node cluster can be configured with two servers: one master node and one worker node.

Larger clusters for actual production will require the following:

1 NameNode server
1 Resource Manager server
Several worker node servers, each running both the DataNode and NodeManager services. The number of worker nodes will depend on the desired compute capacity. The planning guide should help with making that determination.

HA is recommended for production clusters, so you may also need a secondary NameNode server and a secondary ResourceManager server.

2. Determine the data set size. Since, data in a Hadoop cluster can grow quickly, that size should be increased by 20% or more, based on growth projections.

3. Apply the HDFS replication factor to the data set size to determine storage requirements.

For ONTAP storage, a minimum replication count of “2” is acceptable. To get storage requirements, multiply the data set size by “2”.

4. Calculate storage for each datanode so that the total data set will be spread evenly across them.

Per NetApp SAN best practices configure storage as follows:

a. Configure an SVM with the FC protocol enabled

b. Configure LIFs, aggregates, volumes and LUNs to meet storage requirements. Two LUNs per datanode, one LUN per volume should be sufficient.

5. Reference the Hortonworks Ambari Automated install documentation and then complete the Hadoop install:

https://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.2.2.0/index.html

a. Determine which server operating system will be used and then configure your servers per the minimum system requirements.

b. On the storage array, create FC igroups and map storage LUNs to the datanodes.

c. On the datanode servers, partition the LUNs, and create the file systems on the datanode hosts.

d. Create mountpoints on the datanodes for the new file systems.

e. Mount the file systems on the datanodes.

f. Follow the procedure outlined in the Ambari documentation for preparing the environment, configuring the Ambari repository, installing the Ambari Server and deploying the HDP cluster. Once the Ambari Server has been installed, the deployment will be a guided, automated procedure.

After the Ambari Hadoop deployment has finished, data can be loaded into HDFS using a number of utilities, including Flume and Sqoop. We’re now able to harness all the power of ONTAP for Hadoop.

Below is a diagram showing an example of an ONTAP based data lake:

For more information, see the ONTAP 9 documentation center

ONTAP Recipes: Easily create a Data Lake using ONTAP Storage

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112