Pandas has two methods namely, melt() and pivot(), to reshape the data. These methods work similar to gather() and spread() functions of the ‘tidyr’ package in R, respectively. We’ll consider a balance sheet in the format as reported by companies. The balance sheet is stored in a pandas data frame named ‘df’.
This article assumes the reader has a basic knowledge of Excel functions.
For non-programmers, Microsoft Excel is a great tool for preprocessing and handling structured data. Excel has functions and techniques which makes it easier to clean structured data. We’ll discuss a few of the many functions along with a few examples. Before proceeding further, we’ll discuss a few basic functions which will be a part of a larger formulae later in this article.
This function checks a condition and returns a specified value accordingly. In the example below, the function checks a condition “is 2 greater than 3”…
For beginners, EDA might pose a few challenges. This article discusses one of such challenges that every beginner in exploratory data analysis may face at some point in time. This article assumes the reader has a basic knowledge of EDA.
Recently, I was playing with a toy dataset (Loan Data) acquired from Kaggle. I used this dataset to create a dashboard in Tableau and publish it to Tableau Public (you may find the published dashboard here). In the early days of my data science journey, I had a dilemma of choosing between the questions to get the right answer from…
Streamlit is an app framework to deploy machine learning apps built using Python. It is an open-source framework which is similar to the Shiny package in R. Heroku is a platform-as-a-service (PaaS) that enables deployment and managing applications built in several programming languages in the cloud.
According to Central Limit Theorem, as the sample size increases, closer would be the mean of the sample means to the population mean. The distribution of the sample means (a.k.a. sampling distribution of sample means) also looks more Gaussian, irrespective of the underlying population distribution, as the sample size increases, given a sufficient number…
Streamlit is an app framework to deploy machine learning apps built using Python. It is an open-source framework which is similar to the Shiny package in R. This article assumes the reader to have basic working knowledge of Conda environment, Git and machine learning with Python.
We’ll fit a Logistic Regression model to the Iris dataset from the Scikit-Learn package. The code below splits the dataset into train and test sets, to evaluate the model after deployment on the test set. We’ll use the mutual information metric for feature selection using ‘SelectKBest’ method. …
SQL is a language used to mange relational databases, where data is stored in the form of tables. A table in a relational database management system (RDBMS) is similar to a spreadsheet where each column is called a field and each row is called a record. ‘name’, ‘age’, ‘gender’ are few examples of fields.
We’ll use the Titanic dataset from Kaggle and import it into Postgres/PostgreSQL. Below is the process to import the data into Postgres.
Right click on ‘Databases’ and select ‘Create’ -> ‘Database’ and type the name of the database and click ‘Save’.
This article assumes the reader to have a working knowledge of SMOTE, an oversampling technique to handle imbalanced class problem. We’ll discuss the right way to use SMOTE to avoid inaccurate evaluation metrics while using cross-validation techniques. First, we’ll look at the method which may result in an inaccurate cross-validation metric. We’ll use the breast cancer dataset from Scikit-Learn whose classes are slightly imbalanced.
Scope of a variable is the region in the code where the variable is available/accessible. A variable declared outside a function (i.e. the main region of the code) is called a global variable and a variable declared inside a function is called a local variable of that function.
Let’s look at an example to understand it better. In the below example, we declare a variable named ‘global_variable’ in the main section of the code and a variable named ‘local_variable’ inside a function.
global_variable = 1def function()…
List comprehensions help you in performing basic list operations with minimal code (usually with a single line of code). This makes your code efficient and Pythonic. Let’s look at an example to make the concept of list comprehensions clearer.
Let’s create a list of integers from 0 to 9 and multiply each of the element in the list by 2. This can be done by iterating through each of the elements in the list using a for loop and multiply it by 2 and append it to an empty list.
x = list(range(10))