3. Pandas data basics#

We will mostly use Pandas dataframe as our data type. This and the next chapter will introduce the dataframe in detail. Pandas tries to pick up the best features of Numpy, spreadsheets and relational databases, and actually succeeds mostly in it. Pandas is an unbelievably powerful data manipulation tool.

Before we start exploring Pandas, it is good to understand how the standard data types in Python work. So, let’s begin with them.

3.1. Standard data types#

3.1.1. List#

A list is probably the most used data type in Python. It is a variable-length collection of items. A list can be defined in Python using either square brackets or the list function. Lists in Python are mutable. You can change the values of a list afterwards.

example_list = [1,2,'three',4]
example_list[2]
'three'
example_list[2] = 3
example_list
[1, 2, 3, 4]

You can add a new element to the end of the list with append().

example_list.append(5)
example_list
[1, 2, 3, 4, 5]

You can also add an item to a specific location in the list using insert().

example_list.insert(3,3.5)
example_list
[1, 2, 3, 3.5, 4, 5]

With pop() you can remove items from the list.

example_list.pop(3)
3.5
example_list
[1, 2, 3, 4, 5]

You can remove a specific item from the list using remove().

example_list.remove(3)
example_list
[1, 2, 4, 5]

Lists are commonly used to acquire items from generators. Here is an example:

small_gen = (x**2 for x in range(10)) # a generator
list_of_gen = [item for item in small_gen]
list_of_gen
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

You can check if an item is or is not in the list with in and is not.

16 in list_of_gen
True
17 in list_of_gen
False

With + you can combine lists.

example_list + example_list
[1, 2, 4, 5, 1, 2, 4, 5]

You can extend a list with multiple elements with extend.

example_list.extend([6,7,8])
example_list
[1, 2, 4, 5, 6, 7, 8]

You can use sort() to sort a list. It organises the original list. Pandas, in contrary, creates a new object in this kind of operations.

unsorted_list = [7,3,6,4,8,2]
unsorted_list.sort()
unsorted_list
[2, 3, 4, 6, 7, 8]

Slicing is an important concept in Python. The sections of most sequency types can be selected with slice notation.

small_list = list(range(10))
small_list[1:3]
[1, 2]
small_list[0:3]
[0, 1, 2]
small_list[:3]
[0, 1, 2]
small_list[3:]
[3, 4, 5, 6, 7, 8, 9]
small_list[-3:]
[7, 8, 9]
small_list[-5:-3]
[5, 6]

Step can be used to define the frequency of slicing.

small_list[::2]
[0, 2, 4, 6, 8]
small_list[1:8:2]
[1, 3, 5, 7]
small_list[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

3.1.2. Tuple#

A tuple is an immutable sequence of Python objects. Because it is immutable, it has also a fixed length. You can create tuples using parantheses.

small_tuple = (10,20,30,40)
small_tuple
(10, 20, 30, 40)

Otherwise, tuples are very similar to lists in Python. For example, you can access elements with square brackets.

small_tuple[2]
30

And you can concatenate them using +.

small_tuple + small_tuple
(10, 20, 30, 40, 10, 20, 30, 40)

But, they are immutable. You can not change values in them.

small_tuple[2] = 35
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-19b2de519515> in <module>
----> 1 small_tuple[2] = 35

TypeError: 'tuple' object does not support item assignment

Python will automatically unpack the tuple, if it is fed to a collection of variables (remember how a function returns a tuple).

a,b,c,d = small_tuple
a
10
b
20

Because a tuple is immutable, there are not many tuple methods. One useful is count, which counts the occurrences of a value.

small_tuple.count(40)
1

3.1.3. Dict#

Dict is the most complicated built-in data type in Python. It is a mutable collection of key-value pairs of Python objects. Dicts are created using curly braces.

sample_dict = {'a': 500, 'b' : 1000, 'c' : 1500, 'd': 2000}

You can access and add elements of a dict using brackes with a key inside them.

sample_dict['a']
500
sample_dict['e'] = 2500
sample_dict
{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500}

There is no restrction that the keys should be strings and the values should be numbers.

sample_dict[6] = 'some_text'
sample_dict
{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500, 6: 'some_text'}

You can delete an item from a dict using del.

del sample_dict[6]
sample_dict
{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500}

The keys and values methods can be used to get iterators of the dict’s keys and values. (Remember that the list method was commonly used to extract the contents of an iterator.)

list(sample_dict.keys())
['a', 'b', 'c', 'd', 'e']
list(sample_dict.values())
[500, 1000, 1500, 2000, 2500]

You can merge two dictionaries with update.

sample_dict.update({'f':3000,'g':3500})
sample_dict
{'a': 500, 'b': 1000, 'c': 1500, 'd': 2000, 'e': 2500, 'f': 3000, 'g': 3500}

3.1.4. Set#

A set in Python is a collection of unique elements. They can be created with set or with curly braces.

set([5,5,5,6,6,6,7,7,7,8,8,8])
{5, 6, 7, 8}
{4,4,4,5,5,5,6,6,6,7,7,7}
{4, 5, 6, 7}

A Python set is very similar to a mathematical set, so it has methods for mathematical set operations.

x = {3,4,5,6,7,8}
y = {6,7,8,9,10}
x.union(y)
{3, 4, 5, 6, 7, 8, 9, 10}
x.intersection(y)
{6, 7, 8}
x.difference(y)
{3, 4, 5}
x.symmetric_difference(y)
{3, 4, 5, 9, 10}

3.1.5. Comprehensions#

Comprehensions, especially the list ones, are a great feature in Python. With them, you can efficiently create collections with a single statement. Let’s look some examples. (% is the module operator)

squares = [x**2 for x in range(10) if x%2==0]
squares
[0, 4, 16, 36, 64]

You could do exactly the same with a much longer for loop.

squares = []
for x in range(10):
    if x%2==0:
        squares.append(x**2)
squares
[0, 4, 16, 36, 64]

Here is another example:

fd = open('quotes.txt',encoding='utf-8')
[text.upper().rstrip() for text in fd if len(text) < 50]
['“JAVA IS, IN MANY WAYS, C++–.”(MICHAEL FELDMAN)',
 'EVERYTHING CAN BE IMPROVED.  ~CLARENCE W. BARRON',
 "DON'T WATER YOUR WEEDS.  ~HARVEY MACKAY",
 '"NECESSITY IS THE MOTHER OF INVENTION" - PLATO',
 '"INFORMATION IS NOT KNOWLEDGE." ALBERT EINSTEIN',
 'DELETED CODE IS DEBUGGED CODE.  — JEFF SICKEL',
 'THE BEST CODE IS NO CODE AT ALL.',
 'NO CODE IS FASTER THAN NO CODE.  — MERB MOTTO']

Python also has set and dict comprehensions that work similarly. Here is an example. First, we set our stream position to the beginning

fd.seek(0)
0

The following command creates a dict where the keys are the first five letters of each quote and the values are the lengths of the quotes (if the quote is less than 50 characters long).

{item[0:5] : len(item) for item in fd if len(item) < 50}
{'“Java': 48,
 'Every': 49,
 "Don't": 40,
 '"Nece': 47,
 '"Info': 48,
 'Delet': 46,
 'The b': 33,
 'No co': 46}

Nested list comprehensions are useful, but somewhat difficult to grasp. Let’s look some examples.

fd.seek(0)
0

In the following, we split the quotes to a list of words, creating a list of lists

quotes = [quote.rstrip().split(" ") for quote in fd if len(quote) < 45]
quotes
[["Don't", 'water', 'your', 'weeds.', '', '~Harvey', 'MacKay'],
 ['The', 'best', 'code', 'is', 'no', 'code', 'at', 'all.']]

Using a nested list comprehension, we can now collect individual words to a list.

[word for quote in quotes for word in quote]
["Don't",
 'water',
 'your',
 'weeds.',
 '',
 '~Harvey',
 'MacKay',
 'The',
 'best',
 'code',
 'is',
 'no',
 'code',
 'at',
 'all.']

Always remember to close the file.

fd.close()

3.2. Pandas data structures#

Two main data structures of Pandas are Series and Dataframe. Let’s analys both of them.

3.2.1. Pandas series#

Let’s start by importing Pandas.

import pandas as pd

A pandas series is a one-dimensional array of Numpy-type objects, and an associated array of labels, called index.

first_pandas = pd.Series([5,3,7,3,7,1])
first_pandas
0    5
1    3
2    7
3    3
4    7
5    1
dtype: int64

If we do not pass an index when defining a series, it will be given the default index of sequential values.

first_pandas.index
RangeIndex(start=0, stop=6, step=1)

You can define a custom index when creating a series.

second_pandas = pd.Series([1,2,3], index=['first','second','third'])
second_pandas
first     1
second    2
third     3
dtype: int64

A pandas series works like a dict. You can use the index value to pick up a single item.

second_pandas['second']
2

You can also use boolean indexing. Notice how the following command picks up the values on index locations 0, 2 and 4.

first_pandas[first_pandas > 4]
0    5
2    7
4    7
dtype: int64

You can use Numpy operations to your Pandas series.

import numpy as np
first_pandas*2
0    10
1     6
2    14
3     6
4    14
5     2
dtype: int64
np.log(first_pandas)
0    1.609438
1    1.098612
2    1.945910
3    1.098612
4    1.945910
5    0.000000
dtype: float64

As you have noticed, Pandas Series is very similar to a Python dict. Actually, you can very easily create a Pandas series from a dict.

sample_dict = {'a': 500, 'b' : 1000, 'c' : 1500, 'd': 2000}
sample_series = pd.Series(sample_dict)
sample_series
a     500
b    1000
c    1500
d    2000
dtype: int64

Important feature in Pandas is that everything is aligned by the index.

sample_series2 = pd.Series([100,200,300,400], index=['c','a','d','b'])
sample_series + sample_series2
a     700
b    1400
c    1600
d    2300
dtype: int64

If you look carefully, you can see that the summing was done “index-wise”.

You can give a name to the values and index of your Series.

sample_series.name = 'some_values'
sample_series.index.name = 'some_letters'
sample_series
some_letters
a     500
b    1000
c    1500
d    2000
Name: some_values, dtype: int64

You can also change your index values very easily.

sample_series.index = ['aa','bb','cc','dd']
sample_series
aa     500
bb    1000
cc    1500
dd    2000
Name: some_values, dtype: int64

3.2.2. Dataframe#

Pandas Dataframe is a rectangular table of objects with similar features as Pandas Series. Each column can have different type of values. In a way, it is a 2-dimensional extension of Pandas Series. For example, it has both a row and a column index.

A dataframe can be constucted in many ways. One popular option is to use a dict of equal-length lists.

company_data = {'year': ['2015','2016','2017','2018','2019','2020'],
                'sales': [102000, 101000, 105000, 115000, 111000, 109000],
               'profit': [5000, 6000, 8000, 7000, 9000, 3000]}
company_df = pd.DataFrame(company_data)
company_df
year sales profit
0 2015 102000 5000
1 2016 101000 6000
2 2017 105000 8000
3 2018 115000 7000
4 2019 111000 9000
5 2020 109000 3000

You can define the order of columns when creating a dataframe.

pd.DataFrame(company_data, columns = ['year','profit','sales'])
year profit sales
0 2015 5000 102000
1 2016 6000 101000
2 2017 8000 105000
3 2018 7000 115000
4 2019 9000 111000
5 2020 3000 109000

If you add a column to the list that is not included in the dict, it will be added to the dataframe with missing values.

company_df = pd.DataFrame(company_data, columns = ['year','sales','profit','total_assets'])
company_df
year sales profit total_assets
0 2015 102000 5000 NaN
1 2016 101000 6000 NaN
2 2017 105000 8000 NaN
3 2018 115000 7000 NaN
4 2019 111000 9000 NaN
5 2020 109000 3000 NaN

You can pick up one column from a dataframe using the usual bracket-notation. It will return the column as a Pandas series.

company_df['profit']
0    5000
1    6000
2    8000
3    7000
4    9000
5    3000
Name: profit, dtype: int64

Value assignment to a dataframe is easy. You can either add one value that is repeated in every instance of a column.

company_df['total_assets'] = 105000
company_df
year sales profit total_assets
0 2015 102000 5000 105000
1 2016 101000 6000 105000
2 2017 105000 8000 105000
3 2018 115000 7000 105000
4 2019 111000 9000 105000
5 2020 109000 3000 105000

Or you can add a full list of values.

company_df['total_assets'] = np.random.randint(105000,120000,6)
company_df
year sales profit total_assets
0 2015 102000 5000 109004
1 2016 101000 6000 109098
2 2017 105000 8000 108761
3 2018 115000 7000 113786
4 2019 111000 9000 108386
5 2020 109000 3000 112777

Remember that if you add a Pandas series to a datarame, the values will be matched according to their index values! For missing index values, a NaN-value will be added.

debt_sr = pd.Series([9000,6000,8000,10000],index=[2,0,5,3])
debt_sr
2     9000
0     6000
5     8000
3    10000
dtype: int64
company_df['debt']  = debt_sr
company_df
year sales profit total_assets debt
0 2015 102000 5000 109004 6000.0
1 2016 101000 6000 109098 NaN
2 2017 105000 8000 108761 9000.0
3 2018 115000 7000 113786 10000.0
4 2019 111000 9000 108386 NaN
5 2020 109000 3000 112777 8000.0

There are many ways to remove columns. One option is to use del that works also with other Python data types.

del company_df['debt']
company_df
year sales profit total_assets
0 2015 102000 5000 109004
1 2016 101000 6000 109098
2 2017 105000 8000 108761
3 2018 115000 7000 113786
4 2019 111000 9000 108386
5 2020 109000 3000 112777

A dict of dicts can also by used to create a dataframe. The keys of the outer dict are the columns and the keys of the inner dict are the index values. (Btw., In real applications we are not “writing” our data. Almost always we load the data from a file, using for example, Pandas read_csv(). So, do not worry.)

company_dict = {'company a': {2010: 10.9, 2011: 8.7, 2012: 9.5},'company b': {2010: 11.2, 2011: 9.6, 2012: 8.8}}
new_df = pd.DataFrame(company_dict)
new_df
company a company b
2010 10.9 11.2
2011 8.7 9.6
2012 9.5 8.8

Transposing data is done in the same way as in Numpy, just by adding .T after the name of the dataframe.

new_df.T
2010 2011 2012
company a 10.9 8.7 9.5
company b 11.2 9.6 8.8

There are still many other ways to construct a dataframe, and we will use some of them in the following chapters. A detailed introduction to constructing a dataframe can be found here: pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html

There is actually a third data type in Pandas, Index objects. They are used to hold the labels of columns and index of a dataframe or series. A Pandas index is immutable and has similiraties with Python sets. However, the Pandas index can contain duplicate items.

new_df.index
Int64Index([2010, 2011, 2012], dtype='int64')

3.3. Basic Pandas functions#

Let’s use a more realistic data for the following examples.( More about read_csv() in the later chapters.)

big_companies_df = pd.read_csv('big_companies.csv',index_col='NAME',delimiter=';')
big_companies_df
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42

With reindex(), you can rearrange the values of the dataframe. Let’s load Numpy.

import numpy as np

The following command shuffles the order of the index values (company names).

random_order = np.random.permutation(big_companies_df.index)
random_order
array(['MICROSOFT', 'APPLE', 'WALMART', 'ALPHABET A', "NESTLE 'N'",
       'TESLA', 'SAUDI ARABIAN OIL', 'TAIWAN SEMICON.MNFG.',
       'JOHNSON & JOHNSON', 'TENCENT HOLDINGS', 'AMAZON.COM',
       'FACEBOOK CLASS A', 'ALIBABA GROUP HOLDING ADR 1:8'], dtype=object)
big_companies_df.reindex(index = random_order)
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60

You can also rearrange columns with reindex().

random_columns = np.random.permutation(big_companies_df.columns)
random_columns
array(['TOTAL DEBT % COMMON EQUITY', 'RETURN ON EQUITY - TOTAL (%)',
       'P/E - RATIO', 'CASH FLOW/SALES', 'RETURN ON INVESTED CAPITAL',
       'DIVIDEND YIELD'], dtype=object)
big_companies_df.reindex(columns = random_columns)
TOTAL DEBT % COMMON EQUITY RETURN ON EQUITY - TOTAL (%) P/E - RATIO CASH FLOW/SALES RETURN ON INVESTED CAPITAL DIVIDEND YIELD
NAME
APPLE 119.40 55.92 35.4 28.01 27.65 0.71
SAUDI ARABIAN OIL 16.96 32.25 29.5 31.73 28.34 0.21
MICROSOFT 61.56 40.14 36.5 43.46 24.89 1.06
AMAZON.COM 83.31 21.95 121.1 14.60 12.97 0.00
ALIBABA GROUP HOLDING ADR 1:8 16.60 23.77 31.0 31.23 17.39 0.00
FACEBOOK CLASS A 0.47 19.96 32.0 41.11 19.94 0.00
TENCENT HOLDINGS 53.74 24.68 42.6 35.55 15.92 0.23
ALPHABET A 2.26 18.12 33.2 33.18 17.76 0.00
TESLA 202.77 -14.94 1036.3 11.21 -1.61 0.00
WALMART 72.95 20.22 26.6 4.88 12.55 1.54
JOHNSON & JOHNSON 46.57 25.36 26.7 23.83 17.38 2.71
TAIWAN SEMICON.MNFG. 11.89 20.94 24.7 57.91 18.96 2.31
NESTLE 'N' 71.42 23.05 23.3 16.85 14.21 2.47

If we have names in the reindex list that are not included in the original index/columns, these are added to the dataframe with missing values. You can prevent those missing values, for example, with ffill in the parameter method (there is also bfill and nearest).

new_df = big_companies_df.reindex(['APPLE','MICROSOFT','TESLA'])
new_df
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77

Ffill is a forward fill. With strings, it adds the values of the original dataframe forward, according to the alphabetical order of the strings. Because APPLE is after AMAZON.COM, ALPHABET and ALIBABA… in alphabetical order, NaN values are put to these new rows.

new_df.reindex(big_companies_df.index,method = 'ffill')
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 1.06 43.46 40.14 24.89 36.5 61.56
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM NaN NaN NaN NaN NaN NaN
ALIBABA GROUP HOLDING ADR 1:8 NaN NaN NaN NaN NaN NaN
FACEBOOK CLASS A 0.71 28.01 55.92 27.65 35.4 119.40
TENCENT HOLDINGS 1.06 43.46 40.14 24.89 36.5 61.56
ALPHABET A NaN NaN NaN NaN NaN NaN
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
WALMART 0.00 11.21 -14.94 -1.61 1036.3 202.77
JOHNSON & JOHNSON 0.71 28.01 55.92 27.65 35.4 119.40
TAIWAN SEMICON.MNFG. 1.06 43.46 40.14 24.89 36.5 61.56
NESTLE 'N' 1.06 43.46 40.14 24.89 36.5 61.56

It is easy to drop values by index values from a series/dataframe using drop(). By default, Pandas uses index. Defining axis=1, you can drop columns. Another option is to drop both using index and columns

big_companies_df.drop('APPLE')
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42
big_companies_df.drop('DIVIDEND YIELD',axis=1)
CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 28.01 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 31.73 32.25 28.34 29.5 16.96
MICROSOFT 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 35.55 24.68 15.92 42.6 53.74
ALPHABET A 33.18 18.12 17.76 33.2 2.26
TESLA 11.21 -14.94 -1.61 1036.3 202.77
WALMART 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 16.85 23.05 14.21 23.3 71.42
big_companies_df.drop(index=['APPLE','TESLA'],columns=['DIVIDEND YIELD','P/E - RATIO'])
CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL TOTAL DEBT % COMMON EQUITY
NAME
SAUDI ARABIAN OIL 31.73 32.25 28.34 16.96
MICROSOFT 43.46 40.14 24.89 61.56
AMAZON.COM 14.60 21.95 12.97 83.31
ALIBABA GROUP HOLDING ADR 1:8 31.23 23.77 17.39 16.60
FACEBOOK CLASS A 41.11 19.96 19.94 0.47
TENCENT HOLDINGS 35.55 24.68 15.92 53.74
ALPHABET A 33.18 18.12 17.76 2.26
WALMART 4.88 20.22 12.55 72.95
JOHNSON & JOHNSON 23.83 25.36 17.38 46.57
TAIWAN SEMICON.MNFG. 57.91 20.94 18.96 11.89
NESTLE 'N' 16.85 23.05 14.21 71.42

An important thing to notice is that most of the Pandas methods are non-destructive. In the previous operations, we have made many manipulations to the big_companies_df dataframe. However, it is still the same.

big_companies_df
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42

Pandas makes almost all manipulations so that the manipulated dataframe is returned as a new object. If you want to change the original dataframe, almost all the methods have inplace=True for that.

Indexing, selecting values and slicing Pandas dataframes is very similar to other Python objects. One important difference is that when using labels for slicing, the last label is inclusive.

big_companies_df['MICROSOFT':'TESLA']
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77

You can also pick up columns using double brackets, put the slicing is not possible.

big_companies_df[['DIVIDEND YIELD','CASH FLOW/SALES']]
DIVIDEND YIELD CASH FLOW/SALES
NAME
APPLE 0.71 28.01
SAUDI ARABIAN OIL 0.21 31.73
MICROSOFT 1.06 43.46
AMAZON.COM 0.00 14.60
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23
FACEBOOK CLASS A 0.00 41.11
TENCENT HOLDINGS 0.23 35.55
ALPHABET A 0.00 33.18
TESLA 0.00 11.21
WALMART 1.54 4.88
JOHNSON & JOHNSON 2.71 23.83
TAIWAN SEMICON.MNFG. 2.31 57.91
NESTLE 'N' 2.47 16.85

You can also pick up values using booleans.

big_companies_df[big_companies_df['P/E - RATIO'] > 40]
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77

You can build a boolean dataframe easily. In the following, we check if a value in the dataframe is larger than 10.

big_companies_df > 10
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE False True True True True True
SAUDI ARABIAN OIL False True True True True True
MICROSOFT False True True True True True
AMAZON.COM False True True True True True
ALIBABA GROUP HOLDING ADR 1:8 False True True True True True
FACEBOOK CLASS A False True True True True False
TENCENT HOLDINGS False True True True True True
ALPHABET A False True True True True False
TESLA False True False False True True
WALMART False False True True True True
JOHNSON & JOHNSON False True True True True True
TAIWAN SEMICON.MNFG. False True True True True True
NESTLE 'N' False True True True True True

To make the indexing more clear, Pandas has operators loc and iloc that can be used to select the rows and columns from a dataframe using Numpy-like notation. loc works with labels and iloc with integer positions.

big_companies_df
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42

Notice that with loc we can use slicing also with columns.

big_companies_df.loc['TENCENT HOLDINGS':'TAIWAN SEMICON.MNFG.','DIVIDEND YIELD':'P/E - RATIO']
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO
NAME
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6
ALPHABET A 0.00 33.18 18.12 17.76 33.2
TESLA 0.00 11.21 -14.94 -1.61 1036.3
WALMART 1.54 4.88 20.22 12.55 26.6
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7
big_companies_df.iloc[6:12,0:5]
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO
NAME
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6
ALPHABET A 0.00 33.18 18.12 17.76 33.2
TESLA 0.00 11.21 -14.94 -1.61 1036.3
WALMART 1.54 4.88 20.22 12.55 26.6
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7

When adding dataframes, any index-column pairs that are missing from either dataframe, will be replaced with NaN-values. Let’s look an example. We create two dataframes with partial data. Both are missing one (different) row and one column.

partial_data_df = big_companies_df.drop(index='APPLE',columns='DIVIDEND YIELD')
partial_data_df
CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
SAUDI ARABIAN OIL 31.73 32.25 28.34 29.5 16.96
MICROSOFT 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 35.55 24.68 15.92 42.6 53.74
ALPHABET A 33.18 18.12 17.76 33.2 2.26
TESLA 11.21 -14.94 -1.61 1036.3 202.77
WALMART 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 16.85 23.05 14.21 23.3 71.42
partial_data2_df = big_companies_df.drop(index='TESLA',columns='CASH FLOW/SALES')
partial_data2_df
DIVIDEND YIELD RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 0.21 32.25 28.34 29.5 16.96
MICROSOFT 1.06 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 24.68 15.92 42.6 53.74
ALPHABET A 0.00 18.12 17.76 33.2 2.26
WALMART 1.54 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 2.71 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 2.31 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 23.05 14.21 23.3 71.42

Those cells that are found in both dataframes have their values summed. If a specific cell is only in one of the dataframes, a NaN value is inserted to that location in the resulting dataframe.

partial_data_df + partial_data2_df
CASH FLOW/SALES DIVIDEND YIELD P/E - RATIO RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL TOTAL DEBT % COMMON EQUITY
NAME
ALIBABA GROUP HOLDING ADR 1:8 NaN NaN 62.0 47.54 34.78 33.20
ALPHABET A NaN NaN 66.4 36.24 35.52 4.52
AMAZON.COM NaN NaN 242.2 43.90 25.94 166.62
APPLE NaN NaN NaN NaN NaN NaN
FACEBOOK CLASS A NaN NaN 64.0 39.92 39.88 0.94
JOHNSON & JOHNSON NaN NaN 53.4 50.72 34.76 93.14
MICROSOFT NaN NaN 73.0 80.28 49.78 123.12
NESTLE 'N' NaN NaN 46.6 46.10 28.42 142.84
SAUDI ARABIAN OIL NaN NaN 59.0 64.50 56.68 33.92
TAIWAN SEMICON.MNFG. NaN NaN 49.4 41.88 37.92 23.78
TENCENT HOLDINGS NaN NaN 85.2 49.36 31.84 107.48
TESLA NaN NaN NaN NaN NaN NaN
WALMART NaN NaN 53.2 40.44 25.10 145.90

If you want to have some other than NaN values to these locations, you can use the add() method with the fill_value parameter. This will replace the missing values with a specified value. Notice how APPLE CASH FLOW/SALES and TESLA/DIVIDEND YIELD still have missing values. This is because they are missing from both partial dataframes.

partial_data_df.add(partial_data2_df,fill_value=0)
CASH FLOW/SALES DIVIDEND YIELD P/E - RATIO RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL TOTAL DEBT % COMMON EQUITY
NAME
ALIBABA GROUP HOLDING ADR 1:8 31.23 0.00 62.0 47.54 34.78 33.20
ALPHABET A 33.18 0.00 66.4 36.24 35.52 4.52
AMAZON.COM 14.60 0.00 242.2 43.90 25.94 166.62
APPLE NaN 0.71 35.4 55.92 27.65 119.40
FACEBOOK CLASS A 41.11 0.00 64.0 39.92 39.88 0.94
JOHNSON & JOHNSON 23.83 2.71 53.4 50.72 34.76 93.14
MICROSOFT 43.46 1.06 73.0 80.28 49.78 123.12
NESTLE 'N' 16.85 2.47 46.6 46.10 28.42 142.84
SAUDI ARABIAN OIL 31.73 0.21 59.0 64.50 56.68 33.92
TAIWAN SEMICON.MNFG. 57.91 2.31 49.4 41.88 37.92 23.78
TENCENT HOLDINGS 35.55 0.23 85.2 49.36 31.84 107.48
TESLA 11.21 NaN 1036.3 -14.94 -1.61 202.77
WALMART 4.88 1.54 53.2 40.44 25.10 145.90

You can also do arithmetic operations between a dataframe and a Pandas series. One thing to note is that these operations use broadcasting like Numpy arrays. Broadcasting is a very important concept in ML. You can read more about it here. numpy.org/doc/stable/user/basics.broadcasting.html

In its simplest, broadcasting means that if we are subtracting a Series from a Dataframe, the Series is subtracted from every row (or column) of the dataframe.

In the following, we subtract the values of TESLA from every row of our dataframe.

sample_series = big_companies_df.loc['TESLA']
sample_series
DIVIDEND YIELD                     0.00
CASH FLOW/SALES                   11.21
RETURN ON EQUITY - TOTAL (%)     -14.94
RETURN ON INVESTED CAPITAL        -1.61
P/E - RATIO                     1036.30
TOTAL DEBT % COMMON EQUITY       202.77
Name: TESLA, dtype: float64

The index of the series is matched with the columns of the dataframe.

big_companies_df-sample_series
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 16.80 70.86 29.26 -1000.9 -83.37
SAUDI ARABIAN OIL 0.21 20.52 47.19 29.95 -1006.8 -185.81
MICROSOFT 1.06 32.25 55.08 26.50 -999.8 -141.21
AMAZON.COM 0.00 3.39 36.89 14.58 -915.2 -119.46
ALIBABA GROUP HOLDING ADR 1:8 0.00 20.02 38.71 19.00 -1005.3 -186.17
FACEBOOK CLASS A 0.00 29.90 34.90 21.55 -1004.3 -202.30
TENCENT HOLDINGS 0.23 24.34 39.62 17.53 -993.7 -149.03
ALPHABET A 0.00 21.97 33.06 19.37 -1003.1 -200.51
TESLA 0.00 0.00 0.00 0.00 0.0 0.00
WALMART 1.54 -6.33 35.16 14.16 -1009.7 -129.82
JOHNSON & JOHNSON 2.71 12.62 40.30 18.99 -1009.6 -156.20
TAIWAN SEMICON.MNFG. 2.31 46.70 35.88 20.57 -1011.6 -190.88
NESTLE 'N' 2.47 5.64 37.99 15.82 -1013.0 -131.35

If the index/columns do not match, the resulting dataframe is a union.

sample_series = sample_series.append(pd.Series(1,index=['extra row']))
big_companies_df-sample_series
CASH FLOW/SALES DIVIDEND YIELD P/E - RATIO RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL TOTAL DEBT % COMMON EQUITY extra row
NAME
APPLE 16.80 0.71 -1000.9 70.86 29.26 -83.37 NaN
SAUDI ARABIAN OIL 20.52 0.21 -1006.8 47.19 29.95 -185.81 NaN
MICROSOFT 32.25 1.06 -999.8 55.08 26.50 -141.21 NaN
AMAZON.COM 3.39 0.00 -915.2 36.89 14.58 -119.46 NaN
ALIBABA GROUP HOLDING ADR 1:8 20.02 0.00 -1005.3 38.71 19.00 -186.17 NaN
FACEBOOK CLASS A 29.90 0.00 -1004.3 34.90 21.55 -202.30 NaN
TENCENT HOLDINGS 24.34 0.23 -993.7 39.62 17.53 -149.03 NaN
ALPHABET A 21.97 0.00 -1003.1 33.06 19.37 -200.51 NaN
TESLA 0.00 0.00 0.0 0.00 0.00 0.00 NaN
WALMART -6.33 1.54 -1009.7 35.16 14.16 -129.82 NaN
JOHNSON & JOHNSON 12.62 2.71 -1009.6 40.30 18.99 -156.20 NaN
TAIWAN SEMICON.MNFG. 46.70 2.31 -1011.6 35.88 20.57 -190.88 NaN
NESTLE 'N' 5.64 2.47 -1013.0 37.99 15.82 -131.35 NaN

Basically all the Numpy’s element-wise array methods work with Pandas objects.

Tesla looks a little bit better now (no negative returns). :)

np.abs(big_companies_df)
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
TESLA 0.00 11.21 14.94 1.61 1036.3 202.77
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42

If you want to apply a function to rows or columns of a dataframe, you can use apply.

differ = lambda x:np.max(x)-np.min(x)

differ calculates the difference between the maximum and minimum of each column.

big_companies_df.apply(differ)
DIVIDEND YIELD                     2.71
CASH FLOW/SALES                   53.03
RETURN ON EQUITY - TOTAL (%)      70.86
RETURN ON INVESTED CAPITAL        29.95
P/E - RATIO                     1013.00
TOTAL DEBT % COMMON EQUITY       202.30
dtype: float64

We can also do this row-wise.

big_companies_df.apply(differ,axis=1)
NAME
APPLE                             118.69
SAUDI ARABIAN OIL                  32.04
MICROSOFT                          60.50
AMAZON.COM                        121.10
ALIBABA GROUP HOLDING ADR 1:8      31.23
FACEBOOK CLASS A                   41.11
TENCENT HOLDINGS                   53.51
ALPHABET A                         33.20
TESLA                            1051.24
WALMART                            71.41
JOHNSON & JOHNSON                  43.86
TAIWAN SEMICON.MNFG.               55.60
NESTLE 'N'                         68.95
dtype: float64

For element-wise manipulations, you can use applymap().

big_companies_df.applymap(lambda x: -x)
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
APPLE -0.71 -28.01 -55.92 -27.65 -35.4 -119.40
SAUDI ARABIAN OIL -0.21 -31.73 -32.25 -28.34 -29.5 -16.96
MICROSOFT -1.06 -43.46 -40.14 -24.89 -36.5 -61.56
AMAZON.COM -0.00 -14.60 -21.95 -12.97 -121.1 -83.31
ALIBABA GROUP HOLDING ADR 1:8 -0.00 -31.23 -23.77 -17.39 -31.0 -16.60
FACEBOOK CLASS A -0.00 -41.11 -19.96 -19.94 -32.0 -0.47
TENCENT HOLDINGS -0.23 -35.55 -24.68 -15.92 -42.6 -53.74
ALPHABET A -0.00 -33.18 -18.12 -17.76 -33.2 -2.26
TESLA -0.00 -11.21 14.94 1.61 -1036.3 -202.77
WALMART -1.54 -4.88 -20.22 -12.55 -26.6 -72.95
JOHNSON & JOHNSON -2.71 -23.83 -25.36 -17.38 -26.7 -46.57
TAIWAN SEMICON.MNFG. -2.31 -57.91 -20.94 -18.96 -24.7 -11.89
NESTLE 'N' -2.47 -16.85 -23.05 -14.21 -23.3 -71.42

sort_index can be used for sorting.

big_companies_df.sort_index()
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
big_companies_df.sort_index(axis=1)
CASH FLOW/SALES DIVIDEND YIELD P/E - RATIO RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL TOTAL DEBT % COMMON EQUITY
NAME
APPLE 28.01 0.71 35.4 55.92 27.65 119.40
SAUDI ARABIAN OIL 31.73 0.21 29.5 32.25 28.34 16.96
MICROSOFT 43.46 1.06 36.5 40.14 24.89 61.56
AMAZON.COM 14.60 0.00 121.1 21.95 12.97 83.31
ALIBABA GROUP HOLDING ADR 1:8 31.23 0.00 31.0 23.77 17.39 16.60
FACEBOOK CLASS A 41.11 0.00 32.0 19.96 19.94 0.47
TENCENT HOLDINGS 35.55 0.23 42.6 24.68 15.92 53.74
ALPHABET A 33.18 0.00 33.2 18.12 17.76 2.26
TESLA 11.21 0.00 1036.3 -14.94 -1.61 202.77
WALMART 4.88 1.54 26.6 20.22 12.55 72.95
JOHNSON & JOHNSON 23.83 2.71 26.7 25.36 17.38 46.57
TAIWAN SEMICON.MNFG. 57.91 2.31 24.7 20.94 18.96 11.89
NESTLE 'N' 16.85 2.47 23.3 23.05 14.21 71.42

If you want sort by values, you can use sort_values()

big_companies_df.sort_values(by = 'P/E - RATIO',ascending=False)
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42

You can use many colums when sorting by values.

big_companies_df.sort_values(by = ['DIVIDEND YIELD','P/E - RATIO'],ascending=[True,False])
DIVIDEND YIELD CASH FLOW/SALES RETURN ON EQUITY - TOTAL (%) RETURN ON INVESTED CAPITAL P/E - RATIO TOTAL DEBT % COMMON EQUITY
NAME
TESLA 0.00 11.21 -14.94 -1.61 1036.3 202.77
AMAZON.COM 0.00 14.60 21.95 12.97 121.1 83.31
ALPHABET A 0.00 33.18 18.12 17.76 33.2 2.26
FACEBOOK CLASS A 0.00 41.11 19.96 19.94 32.0 0.47
ALIBABA GROUP HOLDING ADR 1:8 0.00 31.23 23.77 17.39 31.0 16.60
SAUDI ARABIAN OIL 0.21 31.73 32.25 28.34 29.5 16.96
TENCENT HOLDINGS 0.23 35.55 24.68 15.92 42.6 53.74
APPLE 0.71 28.01 55.92 27.65 35.4 119.40
MICROSOFT 1.06 43.46 40.14 24.89 36.5 61.56
WALMART 1.54 4.88 20.22 12.55 26.6 72.95
TAIWAN SEMICON.MNFG. 2.31 57.91 20.94 18.96 24.7 11.89
NESTLE 'N' 2.47 16.85 23.05 14.21 23.3 71.42
JOHNSON & JOHNSON 2.71 23.83 25.36 17.38 26.7 46.57