H9RCOMP: As The Volume Of Data Is Increasing At An Exponential Rate, It Has Become Of Utmost Importance To Analyze That Bigdata Efficiently For The Betterment: Research In Computing Thesis, NCI, Ireland

University	National College of Ireland (NCI)
Subject	H9RCOMP: Research In Computing

Introduction

As the volume of data is increasing at an exponential rate, it has become of utmost importance to analyze that Bigdata efficiently for the betterment of an organization or for research. Query processing is a major aspect to evaluate huge data and obtain meaningful results from it. The major tools used for this query processing are MapReduce-based Hive and Spark which includes RDD. Hive, created by Facebook, is a data warehouse upon which SQL queries can be executed. These queries get converted to MapReduce tasks under the hood and run on underlying Hadoop and HDFS. PySpark is a data processing framework that works on real-time data. It is an open-source framework based on Resilient Distributed Datsets(RDDs) and includes the PySparkSQL modes to make queries to structured data in a quick manner. With the verge of these query technologies, cloud platforms are readily providing these services on the go so as to minimize the installation & configuration effort and focus more on the actual data analysis and enhancing query capabilities. A good ETL Framework will help in giving an environment to consider data wrangling, data handling, and scaling as per load incrementally. Extracting massive amounts of data from diverse platforms and loading it into a data warehouse is the goal of the ETL process. develop and implement a process for extracting, transforming, and loading (ETL) raw data from a variety of different data sources into meaningful and useful information in a data warehouse/data lake. Also, orchestration of the ETL is important for which Apache Airflow can be employed for such purpose. Exploring Cloud Services capabilities to build a scalable data processing infrastructure.

Research Area

Big Data Tools, ETL framework, Orchestration, Cloud platform, Open Source

Research Questions

Q1. How to create an Open Source Framework for ETL data?

Q2. How Cloud platform’s services can be used to integrate Big Data tools for the ETL process?

Q3. How to handle the parallel processing of huge amounts of data from different types of sources and destinations?

Q4. How to orchestrate the ETL Framework/Tool?

The post H9RCOMP: As The Volume Of Data Is Increasing At An Exponential Rate, It Has Become Of Utmost Importance To Analyze That Bigdata Efficiently For The Betterment: Research In Computing Thesis, NCI, Ireland appeared first on BlueOrigin EssayWriters.