Lets me start by telling you that this is a very common practice in the industry that is using AWS cluster and zeppelin notebook. This is my personal experience that whenever you want to see your script you need to start EMR Cluster and then after that only you can able to view your script.

How does it affect your company?

To start an EMR cluster in AWS is costing to company per hour, so to avoid this unnecessary cost to the company we can use some techniques to avoid this extra cost.

There are many ways to do so…

Lets Begin the Journey

Let us talk about these two profiles. There is very much difference between these two job profiles.

Data analysts, as the word suggests this profile is almost spending there 80 percent of the time with the data. While Business Analysts spend their time with the business.

Again another question what type of work is done and which tools are most popular amount these two job profiles.

Here I will answer as per my experience, Maybe it will differ from person to person.

The data analyst will work on Data Engineering and Data visualization. While Business Analyst will work with Data…

Data Engineering

Let’s start with this, as you know that almost 70 percent of the time you are dealing with data Engineering stuff. Data Engineering includes Data formation as per business demand ,data wrangling, Data Cleaning, and Data preprocessing.

Some of the common programming languages used across all industries are Python, R, Scala, Splunk, SQL. From these, you should know at least 2 languages. Again for newcomers, I would like to suggest them to start with python and SQL.

During my year of experience, I almost devoted my 80% of the time in Data preparation, Data Formation, and other operations…

Do you know 98% of people don’t know what they are learning in Data Science? Hey, I want to become Data Scientist, what should I do know? From where should I start? Which programming language should I learn, Is it important to learn cloud technology like AWS, Azure, etc.

I will try to answer them as per my experience. The first question when I asked people why do you want to become Data Scientist?

The common answer, because there is a chance of getting more money and growth. Seriously, are you sure .wait …

Name Node and Job tracker is called Single point of failure But how? let me explain in very layman language

Name Node is Manager and Job Tracker is assistant to Name Node, Now we know that all the record of a farmer is recorded by Godown Manager and if Godown manager and its report are somehow missing then everything is gone, there is no way to get the record, means Farmer loses there crops.

Similarly, if the job tracker went down means no information will reach to name node and then again in this case also farmer cannot able to…

क्या आप जानते हैं आपके डिसिशन मेकिंग कपाबिलिटी भी अब मशीने तय करने लगी है .
चलिए आपको ले चलते हैं इस टेक्नोलॉजी भरी दुनिया मै और आपको बताते हैं की आपकी डिसिशन मेकिंग कैसे बदल जाती हैं,और ये हो कैसे रहा हैं

उद्धरण लेते हैं सिंपल सा जब आप कोई सामान खरीदती हैं अमेज़न या फिर कही इ-कॉमर्स साइट पर तो उस सामान के साथ आपको रेकोमेंडेशन भी आने लगता है . और उस रेकोमेंडेशन मै आपको आपकी पसंद की ही चीज़े दिखाई देती है और आप उसकी और आकर्षित होकर अपना डिसिशन मै तुरंत ही चेंज कर लेते…

I am a farmer and I used to grow wheat and I have very limited storage capacity than what should I do if I grow more than the storage capacity I have?

I will approach the nearest godown means I am a client, whom should I approach Godown manager means the manager is acting as the name node. what should a manager do he will check whether the room is available for storage or not, for this work he will tell his helper to go and check.

In big data term, this helper is known as a job tracker, this…

Perfect tools for cross-checking your results but how?

Let's talk about my journey with pycaret. I started using it from April onwards and it really helps me a lot during model building. Absolutely perfect for cross-checking your result if you are new to machine learning, when I started with the Machine learning project I always want to try a different type of model and always try to give my best to improve my model and it takes a lot of time because if you go through a normal procedure you have to run all machine learning algorithm one by one…


