Posts

Hadoop :

Image
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Overview Hadoop : Hadoop wasn't the first solution to this problem . Honestly google was the Grand father of all these . GFS : Google file system GFS inspite Hadoop for distributed storage and Map reduces inspires Hadoops distributed processing. What is Hadoop used for  ? As of my last update in September 2021, Hadoop is a popular open-source framework used for distributed storage and processing of large volumes of data. It is primarily designed to handle big data applications, where traditional databases and data processing tools may fall short. Hadoop's core components include: Hadoop Distributed File System (HDFS): A distributed file system designed to store large datasets across multiple commodity hardware nodes. It provides fault tolerance by replicating data across different nodes. MapReduce: A programming model ...

MapReduce |

Image
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++   +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ What is MapReduce ? https://www.youtube.com/watch?v=b-IvmXoO0bU Why Mapreduce ? Prior to 2004 huge data were stored in single servers . If any program ran a query where data is stored in multiple servers , logical integration of those search results and analysis of the data was a nightmare . Not to mention the massive efforts and expenses that were involved. The threat of data loss, challenge of data backup , and reduced  scalability resulted in issue snowballing into a crisis of sorts. To counter this Google introduced Map Reduce in the Dec of 2004. And the anlysis of datasets was done in 10 minutes rather than 8-10 days. Queries could run on multiple servers simultaneously . And search results could be logically integrated and dta could be analyzed in real time. USP of Mapreduce is its fault tolerance and scalability ....