Data Science | ML | Web scraping | Kaggler | Perpetual learner | Out-of-the-box Thinker | Python | SQL | Excel VBA | Tableau | LinkedIn: https://bit.ly/2VexKQu

# Reshaping Data with Pandas

## While analyzing data, we may need to reshape tabular data. Pandas has two methods that aid in reshaping the data into a desired format.

Pandas has two methods namely, melt() and pivot(), to reshape the data. These methods work similar to gather() and spread() functions of the ‘tidyr’ package in R, respectively. We’ll consider a balance sheet in the format as reported by companies. The balance sheet is stored in a pandas data frame named ‘df’.

# Essential Functions in Excel for Data Preprocessing

## This article discusses a few essential functions in Microsoft Excel for data preprocessing along with a few examples. These functions make the process of data preprocessing simpler.

For non-programmers, Microsoft Excel is a great tool for preprocessing and handling structured data. Excel has functions and techniques which makes it easier to clean structured data. We’ll discuss a few of the many functions along with a few examples. Before proceeding further, we’ll discuss a few basic functions which will be a part of a larger formulae later in this article.

# Basic Functions

1. IF

This function checks a condition and returns a specified value accordingly. In the example below, the function checks a condition “is 2 greater than 3”…

# A Beginner’s Dilemma in Exploratory Data Analysis

## Exploratory data analysis (EDA) is an important step in a data science project where you get a feel for your data.

For beginners, EDA might pose a few challenges. This article discusses one of such challenges that every beginner in exploratory data analysis may face at some point in time. This article assumes the reader has a basic knowledge of EDA.

Recently, I was playing with a toy dataset (Loan Data) acquired from Kaggle. I used this dataset to create a dashboard in Tableau and publish it to Tableau Public (you may find the published dashboard here). In the early days of my data science journey, I had a dilemma of choosing between the questions to get the right answer from…

# Deploying a basic Streamlit app to Heroku

## This article demonstrates the deployment of a basic Streamlit app (that simulates the Central Limit Theorem) to Heroku

Streamlit is an app framework to deploy machine learning apps built using Python. It is an open-source framework which is similar to the Shiny package in R. Heroku is a platform-as-a-service (PaaS) that enables deployment and managing applications built in several programming languages in the cloud.

According to Central Limit Theorem, as the sample size increases, closer would be the mean of the sample means to the population mean. The distribution of the sample means (a.k.a. sampling distribution of sample means) also looks more Gaussian, irrespective of the underlying population distribution, as the sample size increases, given a sufficient number…

# Deploying a basic Streamlit app

## This article demonstrates the deployment of a basic Streamlit app (that predicts the Iris’ species) to Streamlit Sharing.

Streamlit is an app framework to deploy machine learning apps built using Python. It is an open-source framework which is similar to the Shiny package in R. This article assumes the reader to have basic working knowledge of Conda environment, Git and machine learning with Python.

# Model Development

We’ll fit a Logistic Regression model to the Iris dataset from the Scikit-Learn package. The code below splits the dataset into train and test sets, to evaluate the model after deployment on the test set. We’ll use the mutual information metric for feature selection using ‘SelectKBest’ method. …

# Structured Query Language (SQL)

SQL is a language used to mange relational databases, where data is stored in the form of tables. A table in a relational database management system (RDBMS) is similar to a spreadsheet where each column is called a field and each row is called a record. ‘name’, ‘age’, ‘gender’ are few examples of fields.

# Data Import

We’ll use the Titanic dataset from Kaggle and import it into Postgres/PostgreSQL. Below is the process to import the data into Postgres.

Step 1:

Right click on ‘Databases’ and select ‘Create’ -> ‘Database’ and type the name of the database and click ‘Save’.

# The right way of using SMOTE with Cross-validation

## This article discusses the right way to use SMOTE to avoid inaccurate evaluation metrics while using cross-validation

This article assumes the reader to have a working knowledge of SMOTE, an oversampling technique to handle imbalanced class problem. We’ll discuss the right way to use SMOTE to avoid inaccurate evaluation metrics while using cross-validation techniques. First, we’ll look at the method which may result in an inaccurate cross-validation metric. We’ll use the breast cancer dataset from Scikit-Learn whose classes are slightly imbalanced.

# Understanding the scope of the variables in Python

## This article discusses the scope of the variables in Python, which is one of the fundamental concepts of Python programming.

Scope of a variable is the region in the code where the variable is available/accessible. A variable declared outside a function (i.e. the main region of the code) is called a global variable and a variable declared inside a function is called a local variable of that function.

`################## GLOBAL VARIABLES##################def x:    ################     LOCAL VARIABLE    ################`

Let’s look at an example to understand it better. In the below example, we declare a variable named ‘global_variable’ in the main section of the code and a variable named ‘local_variable’ inside a function.

`global_variable = 1def function()…`

# Understanding List Comprehensions in Python

## This article discusses list comprehensions in Python and how to use them to make your code more efficient and Pythonic.

`x = list(range(10))x` 