Pandas data basics

3. Pandas data basics#

We will mostly use Pandas dataframe as our data type. This and the next chapter will introduce the dataframe in detail. Pandas tries to pick up the best features of Numpy, spreadsheets and relational databases, and actually succeeds mostly in it. Pandas is an unbelievably powerful data manipulation tool.

Before we start exploring Pandas, it is good to understand how the standard data types in Python work. So, let’s begin with them.

3.1. Standard data types#

3.1.1. List#

A list is probably the most used data type in Python. It is a variable-length collection of items. A list can be defined in Python using either square brackets or the list function. Lists in Python are mutable. You can change the values of a list afterwards.

example_list = [1,2,'three',4]

example_list[2]

'three'

example_list[2] = 3

example_list

[1, 2, 3, 4]

You can add a new element to the end of the list with append().

example_list.append(5)

example_list

[1, 2, 3, 4, 5]

You can also add an item to a specific location in the list using insert().

example_list.insert(3,3.5)

example_list

[1, 2, 3, 3.5, 4, 5]

With pop() you can remove items from the list.

example_list.pop(3)

3.5

example_list

[1, 2, 3, 4, 5]

You can remove a specific item from the list using remove().

example_list.remove(3)

example_list

[1, 2, 4, 5]

Lists are commonly used to acquire items from generators. Here is an example:

small_gen = (x**2 for x in range(10)) # a generator
list_of_gen = [item for item in small_gen]

list_of_gen

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

You can check if an item is or is not in the list with in and is not.

16 in list_of_gen

True

17 in list_of_gen

False

With + you can combine lists.

example_list + example_list

[1, 2, 4, 5, 1, 2, 4, 5]

You can extend a list with multiple elements with extend.

example_list.extend([6,7,8])

example_list

[1, 2, 4, 5, 6, 7, 8]

You can use sort() to sort a list. It organises the original list. Pandas, in contrary, creates a new object in this kind of operations.

unsorted_list = [7,3,6,4,8,2]

unsorted_list.sort()

unsorted_list

[2, 3, 4, 6, 7, 8]

Slicing is an important concept in Python. The sections of most sequency types can be selected with slice notation.

small_list = list(range(10))

small_list[1:3]

[1, 2]

small_list[0:3]

[0, 1, 2]

small_list[:3]

[0, 1, 2]

small_list[3:]

[3, 4, 5, 6, 7, 8, 9]

small_list[-3:]

[7, 8, 9]

small_list[-5:-3]

[5, 6]

Step can be used to define the frequency of slicing.

small_list[::2]

[0, 2, 4, 6, 8]

small_list[1:8:2]

[1, 3, 5, 7]

small_list[::-1]

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

3.1.2. Tuple#

A tuple is an immutable sequence of Python objects. Because it is immutable, it has also a fixed length. You can create tuples using parantheses.

small_tuple = (10,20,30,40)

small_tuple

(10, 20, 30, 40)

Otherwise, tuples are very similar to lists in Python. For example, you can access elements with square brackets.

small_tuple[2]

And you can concatenate them using +.

small_tuple + small_tuple

(10, 20, 30, 40, 10, 20, 30, 40)

But, they are immutable. You can not change values in them.

small_tuple[2] = 35

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-19b2de519515> in <module>
----> 1 small_tuple[2] = 35

TypeError: 'tuple' object does not support item assignment

Python will automatically unpack the tuple, if it is fed to a collection of variables (remember how a function returns a tuple).

a,b,c,d = small_tuple

Because a tuple is immutable, there are not many tuple methods. One useful is count, which counts the occurrences of a value.

small_tuple.count(40)

3.1.3. Dict#

Dict is the most complicated built-in data type in Python. It is a mutable collection of key-value pairs of Python objects. Dicts are created using curly braces.

sample_dict = {'a': 500, 'b' : 1000, 'c' : 1500, 'd': 2000}

You can access and add elements of a dict using brackes with a key inside them.

sample_dict['a']

sample_dict['e'] = 2500

sample_dict

{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500}

There is no restrction that the keys should be strings and the values should be numbers.

sample_dict[6] = 'some_text'

sample_dict

{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500, 6: 'some_text'}

You can delete an item from a dict using del.

del sample_dict[6]

sample_dict

{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500}

The keys and values methods can be used to get iterators of the dict’s keys and values. (Remember that the list method was commonly used to extract the contents of an iterator.)

list(sample_dict.keys())

['a', 'b', 'c', 'd', 'e']

list(sample_dict.values())

[500, 1000, 1500, 2000, 2500]

You can merge two dictionaries with update.

sample_dict.update({'f':3000,'g':3500})

sample_dict

{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500, 'f': 3000, 'g': 3500}

3.1.4. Set#

A set in Python is a collection of unique elements. They can be created with set or with curly braces.

set([5,5,5,6,6,6,7,7,7,8,8,8])

{5, 6, 7, 8}

{4,4,4,5,5,5,6,6,6,7,7,7}

{4, 5, 6, 7}

A Python set is very similar to a mathematical set, so it has methods for mathematical set operations.

x = {3,4,5,6,7,8}
y = {6,7,8,9,10}

x.union(y)

{3, 4, 5, 6, 7, 8, 9, 10}

x.intersection(y)

{6, 7, 8}

x.difference(y)

{3, 4, 5}

x.symmetric_difference(y)

{3, 4, 5, 9, 10}

3.1.5. Comprehensions#

Comprehensions, especially the list ones, are a great feature in Python. With them, you can efficiently create collections with a single statement. Let’s look some examples. (% is the module operator)

squares = [x**2 for x in range(10) if x%2==0]

squares

[0, 4, 16, 36, 64]

You could do exactly the same with a much longer for loop.

squares = []
for x in range(10):
    if x%2==0:
        squares.append(x**2)

squares

[0, 4, 16, 36, 64]

Here is another example:

fd = open('quotes.txt',encoding='utf-8')

[text.upper().rstrip() for text in fd if len(text) < 50]

['“JAVA IS, IN MANY WAYS, C++–.”(MICHAEL FELDMAN)',
 'EVERYTHING CAN BE IMPROVED.  ~CLARENCE W. BARRON',
 "DON'T WATER YOUR WEEDS.  ~HARVEY MACKAY",
 '"NECESSITY IS THE MOTHER OF INVENTION" - PLATO',
 '"INFORMATION IS NOT KNOWLEDGE." ALBERT EINSTEIN',
 'DELETED CODE IS DEBUGGED CODE.  — JEFF SICKEL',
 'THE BEST CODE IS NO CODE AT ALL.',
 'NO CODE IS FASTER THAN NO CODE.  — MERB MOTTO']

Python also has set and dict comprehensions that work similarly. Here is an example. First, we set our stream position to the beginning

fd.seek(0)

The following command creates a dict where the keys are the first five letters of each quote and the values are the lengths of the quotes (if the quote is less than 50 characters long).

{item[0:5] : len(item) for item in fd if len(item) < 50}

{'“Java': 48,
 'Every': 49,
 "Don't": 40,
 '"Nece': 47,
 '"Info': 48,
 'Delet': 46,
 'The b': 33,
 'No co': 46}

Nested list comprehensions are useful, but somewhat difficult to grasp. Let’s look some examples.

fd.seek(0)

In the following, we split the quotes to a list of words, creating a list of lists

quotes = [quote.rstrip().split(" ") for quote in fd if len(quote) < 45]

quotes

[["Don't", 'water', 'your', 'weeds.', '', '~Harvey', 'MacKay'],
 ['The', 'best', 'code', 'is', 'no', 'code', 'at', 'all.']]

Using a nested list comprehension, we can now collect individual words to a list.

[word for quote in quotes for word in quote]

["Don't",
 'water',
 'your',
 'weeds.',
 '',
 '~Harvey',
 'MacKay',
 'The',
 'best',
 'code',
 'is',
 'no',
 'code',
 'at',
 'all.']

Always remember to close the file.

fd.close()

3.2. Pandas data structures#

Two main data structures of Pandas are Series and Dataframe. Let’s analys both of them.

3.2.1. Pandas series#

Let’s start by importing Pandas.

import pandas as pd

A pandas series is a one-dimensional array of Numpy-type objects, and an associated array of labels, called index.

first_pandas = pd.Series([5,3,7,3,7,1])

first_pandas

  5
  3
  7
  3
  7
  1
dtype: int64

If we do not pass an index when defining a series, it will be given the default index of sequential values.

first_pandas.index

RangeIndex(start=0, stop=6, step=1)

You can define a custom index when creating a series.

second_pandas = pd.Series([1,2,3], index=['first','second','third'])

second_pandas

first     1
second    2
third     3
dtype: int64

A pandas series works like a dict. You can use the index value to pick up a single item.

second_pandas['second']

You can also use boolean indexing. Notice how the following command picks up the values on index locations 0, 2 and 4.

first_pandas[first_pandas > 4]

  5
  7
  7
dtype: int64

You can use Numpy operations to your Pandas series.

import numpy as np

first_pandas*2

  10
   6
  14
   6
  14
   2
dtype: int64

np.log(first_pandas)

  1.609438
  1.098612
  1.945910
  1.098612
  1.945910
  0.000000
dtype: float64

As you have noticed, Pandas Series is very similar to a Python dict. Actually, you can very easily create a Pandas series from a dict.

sample_dict = {'a': 500, 'b' : 1000, 'c' : 1500, 'd': 2000}

sample_series = pd.Series(sample_dict)

sample_series

a     500
b    1000
c    1500
d    2000
dtype: int64

Important feature in Pandas is that everything is aligned by the index.

sample_series2 = pd.Series([100,200,300,400], index=['c','a','d','b'])

sample_series + sample_series2

a     700
b    1400
c    1600
d    2300
dtype: int64

If you look carefully, you can see that the summing was done “index-wise”.

You can give a name to the values and index of your Series.

sample_series.name = 'some_values'
sample_series.index.name = 'some_letters'

sample_series

some_letters
a     500
b    1000
c    1500
d    2000
Name: some_values, dtype: int64

You can also change your index values very easily.

sample_series.index = ['aa','bb','cc','dd']

sample_series

aa     500
bb    1000
cc    1500
dd    2000
Name: some_values, dtype: int64

3.2.2. Dataframe#

Pandas Dataframe is a rectangular table of objects with similar features as Pandas Series. Each column can have different type of values. In a way, it is a 2-dimensional extension of Pandas Series. For example, it has both a row and a column index.

A dataframe can be constucted in many ways. One popular option is to use a dict of equal-length lists.

company_data = {'year': ['2015','2016','2017','2018','2019','2020'],
                'sales': [102000, 101000, 105000, 115000, 111000, 109000],
               'profit': [5000, 6000, 8000, 7000, 9000, 3000]}

company_df = pd.DataFrame(company_data)

company_df

	year	sales	profit
0	2015	102000	5000
1	2016	101000	6000
2	2017	105000	8000
3	2018	115000	7000
4	2019	111000	9000
5	2020	109000	3000

You can define the order of columns when creating a dataframe.

pd.DataFrame(company_data, columns = ['year','profit','sales'])

	year	profit	sales
0	2015	5000	102000
1	2016	6000	101000
2	2017	8000	105000
3	2018	7000	115000
4	2019	9000	111000
5	2020	3000	109000

If you add a column to the list that is not included in the dict, it will be added to the dataframe with missing values.

company_df = pd.DataFrame(company_data, columns = ['year','sales','profit','total_assets'])

company_df

	year	sales	profit	total_assets
0	2015	102000	5000	NaN
1	2016	101000	6000	NaN
2	2017	105000	8000	NaN
3	2018	115000	7000	NaN
4	2019	111000	9000	NaN
5	2020	109000	3000	NaN

You can pick up one column from a dataframe using the usual bracket-notation. It will return the column as a Pandas series.

company_df['profit']

  5000
  6000
  8000
  7000
  9000
  3000
Name: profit, dtype: int64

Value assignment to a dataframe is easy. You can either add one value that is repeated in every instance of a column.

company_df['total_assets'] = 105000

company_df

	year	sales	profit	total_assets
0	2015	102000	5000	105000
1	2016	101000	6000	105000
2	2017	105000	8000	105000
3	2018	115000	7000	105000
4	2019	111000	9000	105000
5	2020	109000	3000	105000

Or you can add a full list of values.

company_df['total_assets'] = np.random.randint(105000,120000,6)

company_df

	year	sales	profit	total_assets
0	2015	102000	5000	109004
1	2016	101000	6000	109098
2	2017	105000	8000	108761
3	2018	115000	7000	113786
4	2019	111000	9000	108386
5	2020	109000	3000	112777

Remember that if you add a Pandas series to a datarame, the values will be matched according to their index values! For missing index values, a NaN-value will be added.

debt_sr = pd.Series([9000,6000,8000,10000],index=[2,0,5,3])

debt_sr

   9000
   6000
   8000
  10000
dtype: int64

company_df['debt']  = debt_sr

company_df

	year	sales	profit	total_assets	debt
0	2015	102000	5000	109004	6000.0
1	2016	101000	6000	109098	NaN
2	2017	105000	8000	108761	9000.0
3	2018	115000	7000	113786	10000.0
4	2019	111000	9000	108386	NaN
5	2020	109000	3000	112777	8000.0

There are many ways to remove columns. One option is to use del that works also with other Python data types.

del company_df['debt']

company_df

	year	sales	profit	total_assets
0	2015	102000	5000	109004
1	2016	101000	6000	109098
2	2017	105000	8000	108761
3	2018	115000	7000	113786
4	2019	111000	9000	108386
5	2020	109000	3000	112777

A dict of dicts can also by used to create a dataframe. The keys of the outer dict are the columns and the keys of the inner dict are the index values. (Btw., In real applications we are not “writing” our data. Almost always we load the data from a file, using for example, Pandas read_csv(). So, do not worry.)

company_dict = {'company a': {2010: 10.9, 2011: 8.7, 2012: 9.5},'company b': {2010: 11.2, 2011: 9.6, 2012: 8.8}}

new_df = pd.DataFrame(company_dict)

new_df

	company a	company b
2010	10.9	11.2
2011	8.7	9.6
2012	9.5	8.8

Transposing data is done in the same way as in Numpy, just by adding .T after the name of the dataframe.

new_df.T

	2010	2011	2012
company a	10.9	8.7	9.5
company b	11.2	9.6	8.8

There are still many other ways to construct a dataframe, and we will use some of them in the following chapters. A detailed introduction to constructing a dataframe can be found here: pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html

There is actually a third data type in Pandas, Index objects. They are used to hold the labels of columns and index of a dataframe or series. A Pandas index is immutable and has similiraties with Python sets. However, the Pandas index can contain duplicate items.

new_df.index

Int64Index([2010, 2011, 2012], dtype='int64')

3.3. Basic Pandas functions#

Let’s use a more realistic data for the following examples.( More about read_csv() in the later chapters.)

big_companies_df = pd.read_csv('big_companies.csv',index_col='NAME',delimiter=';')

big_companies_df

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42

With reindex(), you can rearrange the values of the dataframe. Let’s load Numpy.

import numpy as np

The following command shuffles the order of the index values (company names).

random_order = np.random.permutation(big_companies_df.index)

random_order

array(['MICROSOFT', 'APPLE', 'WALMART', 'ALPHABET A', "NESTLE 'N'",
       'TESLA', 'SAUDI ARABIAN OIL', 'TAIWAN SEMICON.MNFG.',
       'JOHNSON & JOHNSON', 'TENCENT HOLDINGS', 'AMAZON.COM',
       'FACEBOOK CLASS A', 'ALIBABA GROUP HOLDING ADR 1:8'], dtype=object)

big_companies_df.reindex(index = random_order)

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60

You can also rearrange columns with reindex().

random_columns = np.random.permutation(big_companies_df.columns)

random_columns

array(['TOTAL DEBT % COMMON EQUITY', 'RETURN ON EQUITY - TOTAL (%)',
       'P/E - RATIO', 'CASH FLOW/SALES', 'RETURN ON INVESTED CAPITAL',
       'DIVIDEND YIELD'], dtype=object)

big_companies_df.reindex(columns = random_columns)

	TOTAL DEBT % COMMON EQUITY	RETURN ON EQUITY - TOTAL (%)	P/E - RATIO	CASH FLOW/SALES	RETURN ON INVESTED CAPITAL	DIVIDEND YIELD
NAME
APPLE	119.40	55.92	35.4	28.01	27.65	0.71
SAUDI ARABIAN OIL	16.96	32.25	29.5	31.73	28.34	0.21
MICROSOFT	61.56	40.14	36.5	43.46	24.89	1.06
AMAZON.COM	83.31	21.95	121.1	14.60	12.97	0.00
ALIBABA GROUP HOLDING ADR 1:8	16.60	23.77	31.0	31.23	17.39	0.00
FACEBOOK CLASS A	0.47	19.96	32.0	41.11	19.94	0.00
TENCENT HOLDINGS	53.74	24.68	42.6	35.55	15.92	0.23
ALPHABET A	2.26	18.12	33.2	33.18	17.76	0.00
TESLA	202.77	-14.94	1036.3	11.21	-1.61	0.00
WALMART	72.95	20.22	26.6	4.88	12.55	1.54
JOHNSON & JOHNSON	46.57	25.36	26.7	23.83	17.38	2.71
TAIWAN SEMICON.MNFG.	11.89	20.94	24.7	57.91	18.96	2.31
NESTLE 'N'	71.42	23.05	23.3	16.85	14.21	2.47

If we have names in the reindex list that are not included in the original index/columns, these are added to the dataframe with missing values. You can prevent those missing values, for example, with ffill in the parameter method (there is also bfill and nearest).

new_df = big_companies_df.reindex(['APPLE','MICROSOFT','TESLA'])

new_df

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77

Ffill is a forward fill. With strings, it adds the values of the original dataframe forward, according to the alphabetical order of the strings. Because APPLE is after AMAZON.COM, ALPHABET and ALIBABA… in alphabetical order, NaN values are put to these new rows.

new_df.reindex(big_companies_df.index,method = 'ffill')

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	1.06	43.46	40.14	24.89	36.5	61.56
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	NaN	NaN	NaN	NaN	NaN	NaN
ALIBABA GROUP HOLDING ADR 1:8	NaN	NaN	NaN	NaN	NaN	NaN
FACEBOOK CLASS A	0.71	28.01	55.92	27.65	35.4	119.40
TENCENT HOLDINGS	1.06	43.46	40.14	24.89	36.5	61.56
ALPHABET A	NaN	NaN	NaN	NaN	NaN	NaN
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
WALMART	0.00	11.21	-14.94	-1.61	1036.3	202.77
JOHNSON & JOHNSON	0.71	28.01	55.92	27.65	35.4	119.40
TAIWAN SEMICON.MNFG.	1.06	43.46	40.14	24.89	36.5	61.56
NESTLE 'N'	1.06	43.46	40.14	24.89	36.5	61.56

It is easy to drop values by index values from a series/dataframe using drop(). By default, Pandas uses index. Defining axis=1, you can drop columns. Another option is to drop both using index and columns

big_companies_df.drop('APPLE')

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42

big_companies_df.drop('DIVIDEND YIELD',axis=1)

	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	28.01	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	31.73	32.25	28.34	29.5	16.96
MICROSOFT	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	35.55	24.68	15.92	42.6	53.74
ALPHABET A	33.18	18.12	17.76	33.2	2.26
TESLA	11.21	-14.94	-1.61	1036.3	202.77
WALMART	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	16.85	23.05	14.21	23.3	71.42

big_companies_df.drop(index=['APPLE','TESLA'],columns=['DIVIDEND YIELD','P/E - RATIO'])

	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	TOTAL DEBT % COMMON EQUITY
NAME
SAUDI ARABIAN OIL	31.73	32.25	28.34	16.96
MICROSOFT	43.46	40.14	24.89	61.56
AMAZON.COM	14.60	21.95	12.97	83.31
ALIBABA GROUP HOLDING ADR 1:8	31.23	23.77	17.39	16.60
FACEBOOK CLASS A	41.11	19.96	19.94	0.47
TENCENT HOLDINGS	35.55	24.68	15.92	53.74
ALPHABET A	33.18	18.12	17.76	2.26
WALMART	4.88	20.22	12.55	72.95
JOHNSON & JOHNSON	23.83	25.36	17.38	46.57
TAIWAN SEMICON.MNFG.	57.91	20.94	18.96	11.89
NESTLE 'N'	16.85	23.05	14.21	71.42

An important thing to notice is that most of the Pandas methods are non-destructive. In the previous operations, we have made many manipulations to the big_companies_df dataframe. However, it is still the same.

big_companies_df

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42

Pandas makes almost all manipulations so that the manipulated dataframe is returned as a new object. If you want to change the original dataframe, almost all the methods have inplace=True for that.

Indexing, selecting values and slicing Pandas dataframes is very similar to other Python objects. One important difference is that when using labels for slicing, the last label is inclusive.

big_companies_df['MICROSOFT':'TESLA']

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77

You can also pick up columns using double brackets, put the slicing is not possible.

big_companies_df[['DIVIDEND YIELD','CASH FLOW/SALES']]

	DIVIDEND YIELD	CASH FLOW/SALES
NAME
APPLE	0.71	28.01
SAUDI ARABIAN OIL	0.21	31.73
MICROSOFT	1.06	43.46
AMAZON.COM	0.00	14.60
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23
FACEBOOK CLASS A	0.00	41.11
TENCENT HOLDINGS	0.23	35.55
ALPHABET A	0.00	33.18
TESLA	0.00	11.21
WALMART	1.54	4.88
JOHNSON & JOHNSON	2.71	23.83
TAIWAN SEMICON.MNFG.	2.31	57.91
NESTLE 'N'	2.47	16.85

You can also pick up values using booleans.

big_companies_df[big_companies_df['P/E - RATIO'] > 40]

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77

You can build a boolean dataframe easily. In the following, we check if a value in the dataframe is larger than 10.

big_companies_df > 10

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	False	True	True	True	True	True
SAUDI ARABIAN OIL	False	True	True	True	True	True
MICROSOFT	False	True	True	True	True	True
AMAZON.COM	False	True	True	True	True	True
ALIBABA GROUP HOLDING ADR 1:8	False	True	True	True	True	True
FACEBOOK CLASS A	False	True	True	True	True	False
TENCENT HOLDINGS	False	True	True	True	True	True
ALPHABET A	False	True	True	True	True	False
TESLA	False	True	False	False	True	True
WALMART	False	False	True	True	True	True
JOHNSON & JOHNSON	False	True	True	True	True	True
TAIWAN SEMICON.MNFG.	False	True	True	True	True	True
NESTLE 'N'	False	True	True	True	True	True

To make the indexing more clear, Pandas has operators loc and iloc that can be used to select the rows and columns from a dataframe using Numpy-like notation. loc works with labels and iloc with integer positions.

big_companies_df

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42

Notice that with loc we can use slicing also with columns.

big_companies_df.loc['TENCENT HOLDINGS':'TAIWAN SEMICON.MNFG.','DIVIDEND YIELD':'P/E - RATIO']

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO
NAME
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6
ALPHABET A	0.00	33.18	18.12	17.76	33.2
TESLA	0.00	11.21	-14.94	-1.61	1036.3
WALMART	1.54	4.88	20.22	12.55	26.6
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7

big_companies_df.iloc[6:12,0:5]

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO
NAME
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6
ALPHABET A	0.00	33.18	18.12	17.76	33.2
TESLA	0.00	11.21	-14.94	-1.61	1036.3
WALMART	1.54	4.88	20.22	12.55	26.6
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7

When adding dataframes, any index-column pairs that are missing from either dataframe, will be replaced with NaN-values. Let’s look an example. We create two dataframes with partial data. Both are missing one (different) row and one column.

partial_data_df = big_companies_df.drop(index='APPLE',columns='DIVIDEND YIELD')

partial_data_df

	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
SAUDI ARABIAN OIL	31.73	32.25	28.34	29.5	16.96
MICROSOFT	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	35.55	24.68	15.92	42.6	53.74
ALPHABET A	33.18	18.12	17.76	33.2	2.26
TESLA	11.21	-14.94	-1.61	1036.3	202.77
WALMART	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	16.85	23.05	14.21	23.3	71.42

partial_data2_df = big_companies_df.drop(index='TESLA',columns='CASH FLOW/SALES')

partial_data2_df

	DIVIDEND YIELD	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	0.21	32.25	28.34	29.5	16.96
MICROSOFT	1.06	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	24.68	15.92	42.6	53.74
ALPHABET A	0.00	18.12	17.76	33.2	2.26
WALMART	1.54	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	2.71	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	2.31	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	23.05	14.21	23.3	71.42

Those cells that are found in both dataframes have their values summed. If a specific cell is only in one of the dataframes, a NaN value is inserted to that location in the resulting dataframe.

partial_data_df + partial_data2_df

	CASH FLOW/SALES	DIVIDEND YIELD	P/E - RATIO	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	TOTAL DEBT % COMMON EQUITY
NAME
ALIBABA GROUP HOLDING ADR 1:8	NaN	NaN	62.0	47.54	34.78	33.20
ALPHABET A	NaN	NaN	66.4	36.24	35.52	4.52
AMAZON.COM	NaN	NaN	242.2	43.90	25.94	166.62
APPLE	NaN	NaN	NaN	NaN	NaN	NaN
FACEBOOK CLASS A	NaN	NaN	64.0	39.92	39.88	0.94
JOHNSON & JOHNSON	NaN	NaN	53.4	50.72	34.76	93.14
MICROSOFT	NaN	NaN	73.0	80.28	49.78	123.12
NESTLE 'N'	NaN	NaN	46.6	46.10	28.42	142.84
SAUDI ARABIAN OIL	NaN	NaN	59.0	64.50	56.68	33.92
TAIWAN SEMICON.MNFG.	NaN	NaN	49.4	41.88	37.92	23.78
TENCENT HOLDINGS	NaN	NaN	85.2	49.36	31.84	107.48
TESLA	NaN	NaN	NaN	NaN	NaN	NaN
WALMART	NaN	NaN	53.2	40.44	25.10	145.90

If you want to have some other than NaN values to these locations, you can use the add() method with the fill_value parameter. This will replace the missing values with a specified value. Notice how APPLE CASH FLOW/SALES and TESLA/DIVIDEND YIELD still have missing values. This is because they are missing from both partial dataframes.

partial_data_df.add(partial_data2_df,fill_value=0)

	CASH FLOW/SALES	DIVIDEND YIELD	P/E - RATIO	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	TOTAL DEBT % COMMON EQUITY
NAME
ALIBABA GROUP HOLDING ADR 1:8	31.23	0.00	62.0	47.54	34.78	33.20
ALPHABET A	33.18	0.00	66.4	36.24	35.52	4.52
AMAZON.COM	14.60	0.00	242.2	43.90	25.94	166.62
APPLE	NaN	0.71	35.4	55.92	27.65	119.40
FACEBOOK CLASS A	41.11	0.00	64.0	39.92	39.88	0.94
JOHNSON & JOHNSON	23.83	2.71	53.4	50.72	34.76	93.14
MICROSOFT	43.46	1.06	73.0	80.28	49.78	123.12
NESTLE 'N'	16.85	2.47	46.6	46.10	28.42	142.84
SAUDI ARABIAN OIL	31.73	0.21	59.0	64.50	56.68	33.92
TAIWAN SEMICON.MNFG.	57.91	2.31	49.4	41.88	37.92	23.78
TENCENT HOLDINGS	35.55	0.23	85.2	49.36	31.84	107.48
TESLA	11.21	NaN	1036.3	-14.94	-1.61	202.77
WALMART	4.88	1.54	53.2	40.44	25.10	145.90

You can also do arithmetic operations between a dataframe and a Pandas series. One thing to note is that these operations use broadcasting like Numpy arrays. Broadcasting is a very important concept in ML. You can read more about it here. numpy.org/doc/stable/user/basics.broadcasting.html

In its simplest, broadcasting means that if we are subtracting a Series from a Dataframe, the Series is subtracted from every row (or column) of the dataframe.

In the following, we subtract the values of TESLA from every row of our dataframe.

sample_series = big_companies_df.loc['TESLA']

sample_series

DIVIDEND YIELD                     0.00
CASH FLOW/SALES                   11.21
RETURN ON EQUITY - TOTAL (%)     -14.94
RETURN ON INVESTED CAPITAL        -1.61
P/E - RATIO                     1036.30
TOTAL DEBT % COMMON EQUITY       202.77
Name: TESLA, dtype: float64

The index of the series is matched with the columns of the dataframe.

big_companies_df-sample_series

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	16.80	70.86	29.26	-1000.9	-83.37
SAUDI ARABIAN OIL	0.21	20.52	47.19	29.95	-1006.8	-185.81
MICROSOFT	1.06	32.25	55.08	26.50	-999.8	-141.21
AMAZON.COM	0.00	3.39	36.89	14.58	-915.2	-119.46
ALIBABA GROUP HOLDING ADR 1:8	0.00	20.02	38.71	19.00	-1005.3	-186.17
FACEBOOK CLASS A	0.00	29.90	34.90	21.55	-1004.3	-202.30
TENCENT HOLDINGS	0.23	24.34	39.62	17.53	-993.7	-149.03
ALPHABET A	0.00	21.97	33.06	19.37	-1003.1	-200.51
TESLA	0.00	0.00	0.00	0.00	0.0	0.00
WALMART	1.54	-6.33	35.16	14.16	-1009.7	-129.82
JOHNSON & JOHNSON	2.71	12.62	40.30	18.99	-1009.6	-156.20
TAIWAN SEMICON.MNFG.	2.31	46.70	35.88	20.57	-1011.6	-190.88
NESTLE 'N'	2.47	5.64	37.99	15.82	-1013.0	-131.35

If the index/columns do not match, the resulting dataframe is a union.

sample_series = sample_series.append(pd.Series(1,index=['extra row']))

big_companies_df-sample_series

	CASH FLOW/SALES	DIVIDEND YIELD	P/E - RATIO	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	TOTAL DEBT % COMMON EQUITY	extra row
NAME
APPLE	16.80	0.71	-1000.9	70.86	29.26	-83.37	NaN
SAUDI ARABIAN OIL	20.52	0.21	-1006.8	47.19	29.95	-185.81	NaN
MICROSOFT	32.25	1.06	-999.8	55.08	26.50	-141.21	NaN
AMAZON.COM	3.39	0.00	-915.2	36.89	14.58	-119.46	NaN
ALIBABA GROUP HOLDING ADR 1:8	20.02	0.00	-1005.3	38.71	19.00	-186.17	NaN
FACEBOOK CLASS A	29.90	0.00	-1004.3	34.90	21.55	-202.30	NaN
TENCENT HOLDINGS	24.34	0.23	-993.7	39.62	17.53	-149.03	NaN
ALPHABET A	21.97	0.00	-1003.1	33.06	19.37	-200.51	NaN
TESLA	0.00	0.00	0.0	0.00	0.00	0.00	NaN
WALMART	-6.33	1.54	-1009.7	35.16	14.16	-129.82	NaN
JOHNSON & JOHNSON	12.62	2.71	-1009.6	40.30	18.99	-156.20	NaN
TAIWAN SEMICON.MNFG.	46.70	2.31	-1011.6	35.88	20.57	-190.88	NaN
NESTLE 'N'	5.64	2.47	-1013.0	37.99	15.82	-131.35	NaN

Basically all the Numpy’s element-wise array methods work with Pandas objects.

Tesla looks a little bit better now (no negative returns). :)

np.abs(big_companies_df)

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
TESLA	0.00	11.21	14.94	1.61	1036.3	202.77
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42

If you want to apply a function to rows or columns of a dataframe, you can use apply.

differ = lambda x:np.max(x)-np.min(x)

differ calculates the difference between the maximum and minimum of each column.

big_companies_df.apply(differ)

DIVIDEND YIELD                     2.71
CASH FLOW/SALES                   53.03
RETURN ON EQUITY - TOTAL (%)      70.86
RETURN ON INVESTED CAPITAL        29.95
P/E - RATIO                     1013.00
TOTAL DEBT % COMMON EQUITY       202.30
dtype: float64

We can also do this row-wise.

big_companies_df.apply(differ,axis=1)

NAME
APPLE                             118.69
SAUDI ARABIAN OIL                  32.04
MICROSOFT                          60.50
AMAZON.COM                        121.10
ALIBABA GROUP HOLDING ADR 1:8      31.23
FACEBOOK CLASS A                   41.11
TENCENT HOLDINGS                   53.51
ALPHABET A                         33.20
TESLA                            1051.24
WALMART                            71.41
JOHNSON & JOHNSON                  43.86
TAIWAN SEMICON.MNFG.               55.60
NESTLE 'N'                         68.95
dtype: float64

For element-wise manipulations, you can use applymap().

big_companies_df.applymap(lambda x: -x)

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	-0.71	-28.01	-55.92	-27.65	-35.4	-119.40
SAUDI ARABIAN OIL	-0.21	-31.73	-32.25	-28.34	-29.5	-16.96
MICROSOFT	-1.06	-43.46	-40.14	-24.89	-36.5	-61.56
AMAZON.COM	-0.00	-14.60	-21.95	-12.97	-121.1	-83.31
ALIBABA GROUP HOLDING ADR 1:8	-0.00	-31.23	-23.77	-17.39	-31.0	-16.60
FACEBOOK CLASS A	-0.00	-41.11	-19.96	-19.94	-32.0	-0.47
TENCENT HOLDINGS	-0.23	-35.55	-24.68	-15.92	-42.6	-53.74
ALPHABET A	-0.00	-33.18	-18.12	-17.76	-33.2	-2.26
TESLA	-0.00	-11.21	14.94	1.61	-1036.3	-202.77
WALMART	-1.54	-4.88	-20.22	-12.55	-26.6	-72.95
JOHNSON & JOHNSON	-2.71	-23.83	-25.36	-17.38	-26.7	-46.57
TAIWAN SEMICON.MNFG.	-2.31	-57.91	-20.94	-18.96	-24.7	-11.89
NESTLE 'N'	-2.47	-16.85	-23.05	-14.21	-23.3	-71.42

sort_index can be used for sorting.

big_companies_df.sort_index()

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
WALMART	1.54	4.88	20.22	12.55	26.6	72.95

big_companies_df.sort_index(axis=1)

	CASH FLOW/SALES	DIVIDEND YIELD	P/E - RATIO	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	TOTAL DEBT % COMMON EQUITY
NAME
APPLE	28.01	0.71	35.4	55.92	27.65	119.40
SAUDI ARABIAN OIL	31.73	0.21	29.5	32.25	28.34	16.96
MICROSOFT	43.46	1.06	36.5	40.14	24.89	61.56
AMAZON.COM	14.60	0.00	121.1	21.95	12.97	83.31
ALIBABA GROUP HOLDING ADR 1:8	31.23	0.00	31.0	23.77	17.39	16.60
FACEBOOK CLASS A	41.11	0.00	32.0	19.96	19.94	0.47
TENCENT HOLDINGS	35.55	0.23	42.6	24.68	15.92	53.74
ALPHABET A	33.18	0.00	33.2	18.12	17.76	2.26
TESLA	11.21	0.00	1036.3	-14.94	-1.61	202.77
WALMART	4.88	1.54	26.6	20.22	12.55	72.95
JOHNSON & JOHNSON	23.83	2.71	26.7	25.36	17.38	46.57
TAIWAN SEMICON.MNFG.	57.91	2.31	24.7	20.94	18.96	11.89
NESTLE 'N'	16.85	2.47	23.3	23.05	14.21	71.42

If you want sort by values, you can use sort_values()

big_companies_df.sort_values(by = 'P/E - RATIO',ascending=False)

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42

You can use many colums when sorting by values.

big_companies_df.sort_values(by = ['DIVIDEND YIELD','P/E - RATIO'],ascending=[True,False])

	DIVIDEND YIELD	CASH FLOW/SALES	RETURN ON EQUITY - TOTAL (%)	RETURN ON INVESTED CAPITAL	P/E - RATIO	TOTAL DEBT % COMMON EQUITY
NAME
TESLA	0.00	11.21	-14.94	-1.61	1036.3	202.77
AMAZON.COM	0.00	14.60	21.95	12.97	121.1	83.31
ALPHABET A	0.00	33.18	18.12	17.76	33.2	2.26
FACEBOOK CLASS A	0.00	41.11	19.96	19.94	32.0	0.47
ALIBABA GROUP HOLDING ADR 1:8	0.00	31.23	23.77	17.39	31.0	16.60
SAUDI ARABIAN OIL	0.21	31.73	32.25	28.34	29.5	16.96
TENCENT HOLDINGS	0.23	35.55	24.68	15.92	42.6	53.74
APPLE	0.71	28.01	55.92	27.65	35.4	119.40
MICROSOFT	1.06	43.46	40.14	24.89	36.5	61.56
WALMART	1.54	4.88	20.22	12.55	26.6	72.95
TAIWAN SEMICON.MNFG.	2.31	57.91	20.94	18.96	24.7	11.89
NESTLE 'N'	2.47	16.85	23.05	14.21	23.3	71.42
JOHNSON & JOHNSON	2.71	23.83	25.36	17.38	26.7	46.57