Two main components of HADOOP Show
To massively parallelise access and process huge datasets in distributed environment 1. Parallel processing
2. Data Locality
Two functions of MapReduce 1. Map function
2. Reduce function
Describe MapReduce process 1. runs a Map function at data nodes and writes output to <key,value> store on local disk 2.
Map function uses only data in its block 3. intermediate results sent to correct Reducer such that all <key,value> results with the same key are received by the same Reducer 4. Reducer aggregates and summarises intermediate results to create final result 5. Final result is stored on HDFS(and thus replicated) Hadoop Distributed File System
1. Name nodes
2. Data nodes
3. HDFS Clients
Components 1. Resource Manager 2. Node Manager 1. manages container and monitors resource utilisation in each container
2. launch container
HBASE vs Relational Database 1. HBASE - only column families are defined but can contain different columns from one row two to next vs RDBMS - all rows have exact same structure 2. HBASE - built for huge, wide table, easily scalable vs RDBMS - narrower, smaller tables 3. HBASE - no notion of transaction vs RDBMS - ACID transaction always 4. HBASE - Avoid joins and normalisation vs RDBMS - data is normalized 5. HBASE - good for structured, semi structured or unstructured data vs RDBMS - GOOD FOR STRUCTURED DATA
Which of the following types of HDFS nodes stores all the metadata about a file system?NameNode only stores the metadata of HDFS – the directory tree of all files in the file system, and tracks the files across the cluster. 3. NameNode does not store the actual data or the dataset. The data itself is actually stored in the DataNodes.
What is the Hadoop Distributed File System HDFS designed to handle?HDFS (Hadoop Distributed File System) is the primary storage system used by Hadoop applications. This open source framework works by rapidly transferring data between nodes. It's often used by companies who need to handle and store big data.
Which of the following is a disadvantage of the hierarchical data model?The major disadvantage of hierarchical databases is their inflexible nature. The one-to-many structure is not ideal for complex structures as it cannot describe relationships in which each child node has multiple parents nodes.
Is the representation of the database as seen by the DBMS?It is the representation of a database as "seen" by the DBMS. It requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model. A representation of an internal model using the database constructs supported by the chosen database.
|