Page 1 :
Visit Python4csip.com for more updates, , CHAPTER-1 Data Handling using Pandas –I, Pandas:, • It is a package useful for data analysis and manipulation., • Pandas provide an easy way to create, manipulate and wrangle the, data., • Pandas provide powerful and easy-to-use data structures, as well, as the means to quickly perform operations on these structures., Data scientists use Pandas for its following advantages:, •, •, , •, •, , Easily handles missing data., It uses Series for one-dimensional data structure and DataFrame, for multi-dimensional data structure., It provides an efficient way to slice the data., It provides a flexible way to merge, concatenate or reshape the, data., , DATA STRUCTURE IN PANDAS, A data structure is a way to arrange the data in such a way that so it, can be accessed quickly and we can perform various operation on this, data like- retrieval, deletion, modification etc., Pandas deals with 3 data structure1. Series, 2. Data Frame, 3. Panel, We are having only series and data frame in our syllabus., , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 2 :
Visit Python4csip.com for more updates, , Series, Series-Series, , is a, , DATAFEAME, one-dimensional, , array like, , structure, , with, , homogeneous data, which can be used to handle and manipulate data., What makes it special is its index attribute, which has incredible, functionality and is heavily mutable., It has two parts1. Data part (An array of actual data), 2. Associated index with data (associated array of indexes or data labels), e.g.Index, , Data, , 0, , 10, , 1, , 15, , 2, , 18, , 3, , 22, , ✓ We can say that Series is a labeled one-dimensional array, which can hold any type of data., ✓ Data of Series is always mutable, means it can be changed., ✓ But the size of Data of Series is always immutable, means it, cannot be changed., ✓ Series may be considered as a Data Structure with two, arrays out which one array works as Index (Labels) and the, second array works as original Data., ✓ Row Labels in Series are called Index., , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 7 :
Visit Python4csip.com for more updates, , Example-2, , While adding two series, if Non-Matching Index is found in either of the, Series, Then NaN will be printed corresponds to Non-Matching Index., is, , If Non-Matching Index is found in either of the series, then this NonMatching Index corresponding value of that series will be filled as 0., is, , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 14 :
Visit Python4csip.com for more updates, , Slicing in Series, Slicing is a way to retrieve subsets of data from a pandas object. A, slice object syntax is –, , SERIES_NAME [start:end: step], The segments start representing the first item, end representing the, last item, and step representing the increment between each item that, you would like., Example :-, , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 15 :
Visit Python4csip.com for more updates, , DATAFRAME, DATAFEAME, DATAFRAME-It is a two-dimensional, object that is useful, , in, , representing data in the form of rows and columns. It is similar to a, spreadsheet or an SQL table. This is the most commonly used pandas, object. Once we store the data into the Dataframe, we can perform, various operations that are useful in analyzing and understanding the, data., , DATAFRAME STRUCTURE, COLUMNS, , PLAYERNAME, , IPLTEAM, , BASEPRICEINCR, , 0, , ROHIT, , MI, , 13, , 1, , VIRAT, , RCB, , 17, , 2, , HARDIK, , MI, , 14, , INDEX, , DATA, , PROPERTIES OF DATAFRAME, DATAFEAME, , 1. A Dataframe has axes (indices)➢ Row index (axis=0), ➢ Column index (axes=1), 2. It is similar to a spreadsheet , whose row index is called index and, column index is called column name., 3. A Dataframe contains Heterogeneous data., 4. A Dataframe Size is Mutable., 5. A Dataframe Data is Mutable., CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 32 :
Visit Python4csip.com for more updates, , Concat operation in data frame, Pandas provides various facilities for easily combining together Series,, DataFrame., pd.concat(objs, axis=0, join='outer', join_axes=None,ignore_index=False), •, , •, •, , •, , •, , objs − This is a sequence or mapping of Series, DataFrame, or, Panel objects., axis − {0, 1, ...}, default 0. This is the axis to concatenate along., join − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on, other axis(es). Outer for union and inner for intersection., ignore_index − boolean, default False. If True, do not use the, index values on the concatenation axis. The resulting axis will be, labeled 0, ..., n - 1., join_axes − This is the list of Index objects. Specific indexes to, use for the other (n-1) axes instead of performing inner/outer, set logic., , The Concat() performs concatenation operations along an axis., , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 35 :
Visit Python4csip.com for more updates, , Merge operation in data frame, Two DataFrames might hold different kinds of information about the, same entity and linked by some common feature/column. To join these, DataFrames, pandas provides multiple functions like merge(), join() etc., Example-1, , This will give the common rows between the, two data frames for the corresponding column, values (‘id’)., , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 37 :
Visit Python4csip.com for more updates, , Join operation in data frame, It is used to merge data frames based on some common column/key., , 1. Full Outer Join:- The full outer join combines the results of, both the left and the right outer joins. The joined data frame will, contain all records from both the data frames and fill in NaNs for, missing matches on either side. You can perform a full outer join by, specifying the how argument as outer in merge() function., Example-, , The resulting DataFrame had all, the entries from both the tables, with NaN values, , for, , missing, , matches on either side. However,, one more thing to notice is the, suffix which got appended to the, column names to show which column, came from which DataFrame. The, default, , suffixes, , are x and y,, , however, you can modify them by, specifying, , the suffixes argument, , in the merge() function., CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 40 :
Visit Python4csip.com for more updates, , 3. RightJoin, , :-The right join produce a complete set of records, , from data frame B(Right side Data Frame) with the matching records, (where available) in data frame A( Left side data frame). If there is no, match right side will contain null. You have to pass right in how, argument inside merge() function., Example-, , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 41 :
Visit Python4csip.com for more updates, , 4.Left Join, , :- The, , left join produce a complete set of records, , from data frame A(Left side Data Frame) with the matching records, (where available) in data frame B( Right side data frame). If there is, no match left side will contain null. You have to pass left in how, argument inside merge() function., Example-, , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Page 43 :
Visit Python4csip.com for more updates, , CSV File, A CSV is a comma separated values file, which allows data to, be saved in a tabular format. CSV is a simple file such as a, spreadsheet or database. Files in the csv format can be, imported and exported from programs that store data in, tables, such as Microsoft excel or Open Office., CSV files data fields are most often, separated, or delimited by a comma. Here the data in each, row are delimited by comma and individual rows are separated, by newline., To create a csv file, first choose your, favorite text editor such as- Notepad and open a new file., Then enter the text data you want the file to contain,, separating each value with a comma and each row with a new, line. Save the file with the extension.csv. You can open the, file using MS Excel or another spread sheet program. It will, create the table of similar data., , CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR