Page 1 :
For More Updates Visit: www.python4csip.com, , Unit-2-Data Handling using, Pandas-II, , Descriptive Statistics, Statistics is a branch of mathematics that deals with, collecting, interpreting, organization and interpretation of, data. Descriptive statistics involves summarizing and organizing, the data so that it can be easily understood., , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 11 :
For More Updates Visit: www.python4csip.com, Method to find Quartiles?, Let us take an example: suppose we have numbers- 1,3,4,7,8,8,9, , Step 1: Arrange the data in ascending order (already in, ascending order), Step 2: Count total number of observation, say n=7, Step 3: Find out first quartile i.e. Q1 (25%) say 0.25, also called 25TH percentile, Step 4: Now calculate Q1=round (.25(n+1))= round, (.25(7+1)), = round (.25(8)) = 2.0 it means 2ND Observation i.e. 3, Step 5: Calculate second quartile i.e. Q2 (50%) = 0.50, or 50TH percentile, = round (.50(7+1)) = 4TH observation i.e. 7, Step 6: Calculate third Quartile i.e. Q3 (75%) =0.75 or, 75TH percentile = round (.75(7+1)) = 6TH observation=8, , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 18 :
For More Updates Visit: www.python4csip.com, , Sorting, Sorting in data frame can be done row wise or column wise. By default, sorting is done row wise., Pandas provide two types of sort functions1. sort_values(): To sort the data of a given column in ascending or, descending order., 2. sort_index(): To sort the data based on index value., , sort_values() : To sort the data of a given column in, ascending or descending order., Syntax:df.sort_values(by=’col_name’, ascending=True or False, inplace =True or False), by: Give column name on which you want to perform sorting., Ascending : By default ascending is true., Inplace : By default inplace is false. It means if you do not want to create, a new data frame then set its value as True., , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 21 :
For More Updates Visit: www.python4csip.com, Example 3- To sort a data frame based on multiple column., For performing sorting based on multiple column we dodf.sort_values (by=[‘col1’, ‘col2’], ascending=[(True or False), (True or False) ], , As we are sorting the data in ascending, order of Percentage so when two values, in Percentage are same then data frame, will be sorted in descending order of Roll, Number., , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 28 :
For More Updates Visit: www.python4csip.com, , PIVOTING AND AGGREGATION, Pivoting- Pivoting is one of the important aspect of data analyst. It is, used to summarize large amount of data and permit us to access, important records from a large dataset., Python Pandas provide two functions for pivoting., 1. pivot(), 2. pivot-table(), , pivot(), pivot()- pivot() allows us to transform or reshape the data frame based, on the column values according to our perspective. It takes 3, arguments – (index, columns and values)., , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 31 :
For More Updates Visit: www.python4csip.com, The pivot_table() method comes to solve this problem. It works like, pivot, but it aggregates the values from rows with duplicate entries for, the specified columns (means apply aggregate function specify by us)., By default pivot_table() apply mean() to aggregate the values from rows, with duplicate entries for the specified columns. E.g., , #program to Find City wise temperature, Programimport pandas as pd, data={, 'Date':['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-3-2019','13-2019'],, CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 32 :
For More Updates Visit: www.python4csip.com, 'City':['DELHI','DELHI','MUMBAI','MUMBAI','CHENNAI','CH, ENNAI'],, 'Temp':[28,30,22,24,32,34],, 'Humidity':[60,55,80,70,90,85], }, df=pd.DataFrame(data), print (df), pv=pd.pivot_table(df,index='City',values='Temp'), print (pv), OutputDate Humidity Temp, , City, , 0 1-1-2019, , 60, , 28, , DELHI, , 1 1-1-2019, , 55, , 30, , DELHI, , 2 1-2-2019, , 80, , 22 MUMBAI, , 3 1-2-2019, , 70, , 24 MUMBAI, , 4 1-3-2019, , 90, , 32 CHENNAI, , 5 1-3-2019, , 85, , 34 CHENNAI, , Temp, City, CHENNAI, , 33, , DELHI, , 29, , MUMBAI, , 23, , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 33 :
For More Updates Visit: www.python4csip.com, , #Program to find City Wise Maximum temperature, import pandas as pddata={, 'Date':['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-3-2019','13-2019'],, 'city':['DELHI','DELHI','MUMBAI','MUMBAI','CHENNAI','CH, ENNAI'],, 'Temp':[28,30,22,24,32,34],, 'Humidity':[60,55,80,70,90,85], CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 35 :
For More Updates Visit: www.python4csip.com, #Program to print data frame on date index and city column, Example 3import pandas as pd, data={, 'Date':['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-32019','1-3-2019'],, 'city':['DELHI','DELHI','MUMBAI','MUMBAI','CHENNAI, ','CHENNAI'],, 'Temp':[28,30,22,24,32,34],, 'Humidity':[60,55,80,70,90,85], }, df=pd.DataFrame(data), print (df), print(pd.pivot_table(df,index='Date',columns='city')), , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 37 :
For More Updates Visit: www.python4csip.com, , Handling Missing Values- filling & Dropping, In many cases, the data that we receive from many sources may not be, perfect. That means there may be some missing data. For example- in, the given program where employee name is missing in one row and date, of joining is missing in other row., , When we convert the data into data frame, the missing data is, represented by NaN (Not a Number). NaN is a default marker for, the missing value., , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 39 :
For More Updates Visit: www.python4csip.com, But this is not useful as it is filling any type of column with 0. We can, fill each column with a different value by passing the column name and, the value to be used to fill in that column., For example- to fill ‘ename’ with ‘Name Missing’ and ‘Doj’ wityh ’00-000000’. We should supply these values as a dictionary inside fillna(), method., , CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR
Page 41 :
For More Updates Visit: www.python4csip.com, , Importing-Exporting Data between, MySql and Python Pandas, For importing and exporting data between Mysql and Python Pandas we, need to install mysql connector and mysql client module., , Installing and importing mysql connector, mysql, clientWith Anaconda : if we have installed python using Anaconda, then, mysql connector and mysql client need to be installed on your computer., We can check this in Anaconda Navigator, by Clicking on not installed in, Environment and then scroll down to find mysql connector and mysql, client and by clicking on both these, install them in Anaconda., , Steps to import and export data using pandas and Mysql, 1., 2., 3., 4., 5., , Start Python, import mysql.connector package, Create or open a database, Open and establish a connection to the database, Create a cursor object or its instance (required for Pandas to, Mysql), 6. Read a sql query for (Mysql to Pandas) and execute a query for(, Pandas to Mysql), 7. Commit the transaction for(Pandas to Mysql), CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,, PGT (CS) KV OEF KANPUR