Sign in

Data Science | ML | Web scraping | Kaggler | Perpetual learner | Out-of-the-box Thinker | Python | SQL | Excel VBA | Tableau | LinkedIn: https://bit.ly/2VexKQu

While analyzing data, we may need to reshape tabular data. Pandas has two methods that aid in reshaping the data into a desired format.

Photo by Myriam Jessier on Unsplash

Pandas has two methods namely, melt() and pivot(), to reshape the data. These methods work similar to gather() and spread() functions of the ‘tidyr’ package in R, respectively. We’ll consider a balance sheet in the format as reported by companies. The balance sheet is stored in a pandas data frame named ‘df’.


This article discusses a few essential functions in Microsoft Excel for data preprocessing along with a few examples. These functions make the process of data preprocessing simpler.

This article assumes the reader has a basic knowledge of Excel functions.

Photo by Mika Baumeister on Unsplash

For non-programmers, Microsoft Excel is a great tool for preprocessing and handling structured data. Excel has functions and techniques which makes it easier to clean structured data. We’ll discuss a few of the many functions along with a few examples. Before proceeding further, we’ll discuss a few basic functions which will be a part of a larger formulae later in this article.

Basic Functions

  1. IF

This function checks a condition and returns a specified value accordingly. In the example below, the function checks a condition “is 2 greater than 3”…


Exploratory data analysis (EDA) is an important step in a data science project where you get a feel for your data.

For beginners, EDA might pose a few challenges. This article discusses one of such challenges that every beginner in exploratory data analysis may face at some point in time. This article assumes the reader has a basic knowledge of EDA.

Photo by Firmbee.com on Unsplash

Recently, I was playing with a toy dataset (Loan Data) acquired from Kaggle. I used this dataset to create a dashboard in Tableau and publish it to Tableau Public (you may find the published dashboard here). In the early days of my data science journey, I had a dilemma of choosing between the questions to get the right answer from…


This article demonstrates the deployment of a basic Streamlit app (that simulates the Central Limit Theorem) to Heroku

Image by author

Streamlit is an app framework to deploy machine learning apps built using Python. It is an open-source framework which is similar to the Shiny package in R. Heroku is a platform-as-a-service (PaaS) that enables deployment and managing applications built in several programming languages in the cloud.

According to Central Limit Theorem, as the sample size increases, closer would be the mean of the sample means to the population mean. The distribution of the sample means (a.k.a. sampling distribution of sample means) also looks more Gaussian, irrespective of the underlying population distribution, as the sample size increases, given a sufficient number…


This article demonstrates the deployment of a basic Streamlit app (that predicts the Iris’ species) to Streamlit Sharing.

Photo by Joan Gamell on Unsplash

Streamlit is an app framework to deploy machine learning apps built using Python. It is an open-source framework which is similar to the Shiny package in R. This article assumes the reader to have basic working knowledge of Conda environment, Git and machine learning with Python.

Model Development

We’ll fit a Logistic Regression model to the Iris dataset from the Scikit-Learn package. The code below splits the dataset into train and test sets, to evaluate the model after deployment on the test set. We’ll use the mutual information metric for feature selection using ‘SelectKBest’ method. …


This article discusses a few basic SQL commands useful for data analysis. This article doesn’t cover advanced techniques involving multiple tables.

Photo by Joshua Sortino on Unsplash

Structured Query Language (SQL)

SQL is a language used to mange relational databases, where data is stored in the form of tables. A table in a relational database management system (RDBMS) is similar to a spreadsheet where each column is called a field and each row is called a record. ‘name’, ‘age’, ‘gender’ are few examples of fields.

Data Import

We’ll use the Titanic dataset from Kaggle and import it into Postgres/PostgreSQL. Below is the process to import the data into Postgres.

Step 1:

Right click on ‘Databases’ and select ‘Create’ -> ‘Database’ and type the name of the database and click ‘Save’.


This article discusses the right way to use SMOTE to avoid inaccurate evaluation metrics while using cross-validation

Image by Mitchell Luo on Unsplash

This article assumes the reader to have a working knowledge of SMOTE, an oversampling technique to handle imbalanced class problem. We’ll discuss the right way to use SMOTE to avoid inaccurate evaluation metrics while using cross-validation techniques. First, we’ll look at the method which may result in an inaccurate cross-validation metric. We’ll use the breast cancer dataset from Scikit-Learn whose classes are slightly imbalanced.


This article discusses the scope of the variables in Python, which is one of the fundamental concepts of Python programming.

Image by Chris Ried on Unsplash

Scope of a variable is the region in the code where the variable is available/accessible. A variable declared outside a function (i.e. the main region of the code) is called a global variable and a variable declared inside a function is called a local variable of that function.

##################
GLOBAL VARIABLES
##################
def x:
################
LOCAL VARIABLE
################

Let’s look at an example to understand it better. In the below example, we declare a variable named ‘global_variable’ in the main section of the code and a variable named ‘local_variable’ inside a function.

global_variable = 1def function()…


This article discusses list comprehensions in Python and how to use them to make your code more efficient and Pythonic.

Image by Chris Ried on Unsplash

List comprehensions help you in performing basic list operations with minimal code (usually with a single line of code). This makes your code efficient and Pythonic. Let’s look at an example to make the concept of list comprehensions clearer.

Let’s create a list of integers from 0 to 9 and multiply each of the element in the list by 2. This can be done by iterating through each of the elements in the list using a for loop and multiply it by 2 and append it to an empty list.

x = list(range(10))
x

KSV Muralidhar

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store