Learn More On Advance Pandas and NumPy for Data Science (Part -II)

Learn More On Advance Pandas and NumPy for Data Science (Part -II) Cont…….

Welcome Back to Part II article. Hope you all enjoyed the content coverage there, here will continue with the same rhythm and learn more in that space.

We have covered Reshaping DataFrames and Combining DataFrames in Part I. 

learn more on advance pandas and numpy for data science part ii

Working with Categorical data: 

First, try to understand what is Categorical variables in a given dataset. 

Categorical variables are a group of values and it can be labeled easily and it contains definite possible values. It would be Numerical or String. Let’s say Location, Designation, Grade of the students, Age, Sex, and many more examples.

Still, we could divide the categorical data into Ordinal Data and Nominal Data.

  1. Ordinal Data
    • These categorical variables have proper inherent order (like Grade, Designation)
  2. Nominal Data
    • This is just opposite to Ordinal Data, so we can’t expect an inherent order 🙂
  3. Continuous Data
    • Continuous variables/features have an infinite number of values with certain boundaries, always it can be numeric or date/time data type.

learn more on advance pandas and numpy for data science part ii 1

Actually speaking we’re going to do “Encoding” them and take it for further analysis in the Data Science/Machine Learning life cycle process.

As we know that 70 – 80% of effort would be channelized for the EDA process in the Data Science field. Because cleaning and preparing the data is the major task, then only we could prepare the stable model selection and finalize it.

In Which the process of converting categorical data into numerical data is an unavoidable and necessary activity, this activity is called Encoding.

Encoding Techniques

  • One Hot Encoding
  • Dummy Encoding
  • Label Encoding
  • Ordinal Encoding
  • Target Encoding
  • Mean Encoding
  • One-Hot Encoding
    • As we know that FE is transforming the given data into a reasonable form. which is easier to interpret and making data more transparent to helping to build the model.
    • And the same time creating new features to enhance the model, in that aspects the “One-Hot Encoding” methodology coming into the picture.
    • This technique can be used when the features are nominal.
    • In one hot encoding, for each level of a categorical feature, we create a new variable. (Feature/Column)
    • The category can be mapped with a binary variable 0 or 1. based on presence or absence.
  • Dummy Encoding
    • This scheme is similar to one-hot encoding.
    • The categorical encoding methodology transforms the categorical variable into binary variables (also known as dummy variables).
    • The dummy encoding method is an improved version of over one-hot-encoding.
    • It uses N-1 features to represent N labels/categories.

         One-Hot Encoding

learn more on advance pandas and numpy for data science part ii 2

Code

data=pd.DataFrame({“Fruits”:”Apple”,”Banana”,”Cherries”,”Grapes”,”Mango”,”Banana”,”Cherries”,”Grapes”,“Mango”,”Apple”]})
data

#Create object for one-hot encoding
import category_encoders as ce
encoder=ce.OneHotEncoder(cols=’Fruits’,handle_unknown=’return_nan’,return_df=True,use_cat_names=True)
data_encoded = encoder.fit_transform(data)
data_encoded

learn more on advance pandas and numpy for data science part ii 3

Playing with DateTime Data

Whenever you’re dealing with data and time datatype, we can use the DateTime library which is coming along with pandas as Datetime objects. On top of the to_datetime() function help us to convert multiple DataFrame columns into a single DateTime object.

       List of Python datetime Classes

  • datetime – To manipulate dates and times – month, day, year, hour, second, microsecond.
  • date – To manipulate dates alone – month, day, year.
  • time – To manipulate time – hour, minute, second, microsecond.
  • timedelta— Dates and Time measuring.
  • tzinfo— Dealing with time zones.

       learn more on advance pandas and numpy for data science part ii 4

learn more on advance pandas and numpy for data science part ii 5

Converting data types

As we know that converting data type is common across all programming languages. Python not exceptional for this. Python provided type conversion functions to convert one data type to another.

Type Conversion in Python:

  • Explicit Type Conversion: During the development, the developer will write the code to change the type of data as per their requirement in the flow of the program. 
  • Implicit Type Conversion: Python has the capability to convert type automatically without any manual involvement.

Explicit Type Conversion

Code

# Type conversion in Python
strA = “1999” #Sting type

# printing string value converting to int
intA = int(strA,10)
print (“Into integer : “, intA)

# printing string converting to float
floatA = float(strA)
print (“into float : “, floatA)

Output

Into integer : 1999
into float : 1999.0

# Type conversion in Python

# initializing string
strA = “Welcome”

ListA = list(strA)
print (“Converting string to list :”,(ListA))

tupleA = tuple(strA)
print (“Converting string to list :”,(tupleA))

Output

Converting string to list : [‘W’, ‘e’, ‘l’, ‘c’, ‘o’, ‘m’, ‘e’]
Converting string to list : (‘W’, ‘e’, ‘l’, ‘c’, ‘o’, ‘m’, ‘e’)

Few other function 

dict() : Used to convert a tuple  into a dictionary.
str() : Used to convert integer into a string.

Implicit Type Conversion

a = 100
print(“a is of type:”,type(a))
b = 100.6
print(“b is of type:”,type(b))
c = a + b
print(c)
print(“c is of type:”,type(c))

Output

a is of type: <class ‘int’>
b is of type: <class ‘float’>
200.6
c is of type: <class ‘float’>

Access Modifiers in Python: As we know that Python supports Oops. So certainly we Public, Private and Protected has to be there, Yes Of course! 

Python access modifications are used to restrict the variables and methods of the class. In Python, we have to use UNDERSCORE  ‘_’ symbol to determine the access control for a data member and/or methods of a class.

  • Public Access Modifier:
    • By default, all data member and member functions are public, and accessible from anywhere in the program file.
  • Protected Access Modifier:
    • If we wanted to declare the protected data member or member functions prefix with a single underscore ‘_’ symbol
  • Private Access Modifier:
    • If we wanted to declare the private data member or member functions prefix with a double underscore ‘__’ symbol.

learn more on advance pandas and numpy for data science part ii 6

Public Access Example

class Employee:
def __init__(self, name, age):

# public data mambers
self.EmpName = name
self.EmpAge = age

# public memeber function displayEmpAge
def displayEmpAge(self):

# accessing public data member
print(“Age: “, self.EmpAge)

# creating object of the class from Employee Class
objEmp = Employee(“John”, 40)

# accessing public data member
print(“Name: Mr.”, objEmp.EmpName)

# calling public member function of the class
objEmp.displayEmpAge()

OUTPUT

Name: Mr. John
Age: 40

Protected Access Example

Creating super class and derived class. Accessing private and public member function.

# super class
class Employee:

# protected data members
_name = None
_designation = None
_salary = None

# constructor
def __init__(self, name, designation, salary):
self._name = name
self._designation = designation
self._salary = salary

# protected member function
def _displayotherdetails(self):

# accessing protected data members
print(“Designation: “, self._designation)
print(“Salary: “, self._salary)

# derived class
class Employee_A(Employee):

# constructor
def __init__(self, name, designation, salary):
Employee.__init__(self, name, designation, salary)

# public member function
def displayDetails(self):

# accessing protected data members of super class
print(“Name: “, self._name)

# accessing protected member functions of superclass
self._displayotherdetails()

# creating objects of the derived class
obj = Employee_A(“David”, “Data Scientist “, 5000)

# calling public member functions of the class
obj.displayDetails()

OUTPUT

Name: David
Designation: Data Scientist
Salary: 5000

Hope we learned so many things, very useful for you everyone, and have to cover Numpy, Here I am passing and continue in my next article(s). Thanks for your time and reading this, Kindly leave your comments. Will connect shortly!

Until then Bye and see you soon – Cheers! Shanthababu.

Source Prolead brokers usa