Thursday, 3 September 2020

Numpy - II

After the first post on Numpy, we explore more aspects of Numpy. The idea is to cover as far as possible the most basic of these and, thus, lay the foundation for future work in areas of AI, ML or Data Science

Like in the earlier post, we will be using Jupyter notebook for all the work in this article. The code is in blue font and output is in green font below the code. The version details are given below:

import sys
print("Python version:", sys.version)

import numpy as np
print("NumPy version:", np.__version__)

Python version: 3.8.3 (default, Jul  2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
NumPy version: 1.18.5

Some of the basic statistic functions are shown below:

numpy_array11 = np.array([[ 1, 2, 3, 4,  5,  6,  7,  8],
                          [ 6, 7, 8, 9, 10, 11, 12, 13]])
numpy_array11


array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 6,  7,  8,  9, 10, 11, 12, 13]])


numpy_array11.sum()


112

numpy_array11.min()

1

numpy_array11.max()

13

numpy_array11.mean()

7.0

np.median(numpy_array11)

7.0

numpy_array11.std()

3.391164991562634


numpy_array11.var()

11.5

numpy_array11.max(axis=0)   ## max column wise

array([ 6,  7,  8,  9, 10, 11, 12, 13])


numpy_array11.max(axis=1)   ## max row wise


array([ 8, 13])

numpy_array11.cumsum(axis=0) ## cumulative sum along column

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 7,  9, 11, 13, 15, 17, 19, 21]], dtype=int32)


numpy_array11.cumsum(axis=1) ## cumulative sum along row

array([[ 1,  3,  6, 10, 15, 21, 28, 36],
       [ 6, 13, 21, 30, 40, 51, 63, 76]], dtype=int32)

There are two ways arrays can be copied: Shallow copy and Deep copy. Commands for both are shown below:

numpy_array11 = np.array([[ 1, 2, 3, 4,  5,  6,  7,  8],
                          [ 6, 7, 8, 9, 10, 11, 12, 13]])
numpy_array11


array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 6,  7,  8,  9, 10, 11, 12, 13]])


numpy_array11_view = numpy_array11.view() 
numpy_array11_view


array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 6,  7,  8,  9, 10, 11, 12, 13]])

numpy_array11_view is a new view of array but shares the same data.This copying technique is called Shallow copy


numpy_array11_view is numpy_array11     

 
False

Above test shows that numpy_array11_view is not numpy_array11 itself

numpy_array11_view.base is numpy_array11

True 

Above test confirms that data in numpy_array11_view is based on numpy_array11


id(numpy_array11)       # identifier of numpy_array11


2674179372976


id(numpy_array11_view)  # identifier of numpy_array11_view


2674179374496

Identifier of numpy_array11_view is different from numpy_array11


numpy_array11_deepcopy = numpy_array11.copy()


numpy_array11_deepcopy

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 6,  7,  8,  9, 10, 11, 12, 13]])

Above manner of copying is called deep copy in which a new object of array is created with the same data that is not shared

numpy_array11_deepcopy is numpy_array11      

False

The test above shows that numpy_array11_deepcopy is not numpy_array11


numpy_array11_deepcopy.base is numpy_array11 

False

Above test shows that data in numpy_array11_deepcopy is not based on numpy_array11

Sorting arrays examples are shown below:

numpy_array16= np.random.randint(100, size=(5, 5))
numpy_array16


array([[38, 56, 80, 29, 32],
       [26, 55,  0, 47,  5],
       [29, 27, 50, 49, 35],
       [54, 50, 97, 56, 31],
       [63, 27, 38, 98, 84]])


np.sort(numpy_array16)        # sort along the last axis

array([[29, 32, 38, 56, 80],
       [ 0,  5, 26, 47, 55],
       [27, 29, 35, 49, 50],
       [31, 50, 54, 56, 97],
       [27, 38, 63, 84, 98]])


np.sort(numpy_array16, axis=0)   # sort along the first axis

array([[26, 27,  0, 29,  5],
       [29, 27, 38, 47, 31],
       [38, 50, 50, 49, 32],
       [54, 55, 80, 56, 35],
       [63, 56, 97, 98, 84]])


np.sort(numpy_array16, axis=None)  # sort the flattened array

array([ 0,  5, 26, 27, 27, 29, 29, 31, 32, 35, 38, 38, 47, 49, 50, 50, 54,
       55, 56, 56, 63, 80, 84, 97, 98])


dtype = [('name', 'S10'), ('salary', float), ('age', int)]
values = [('Nick', 5500, 41), ('Kyle', 6500, 44),
          ('Ken', 7500, 44)]


structured_array1 = np.array(values, dtype=dtype)       # create a structured array
np.sort(structured_array1, order='salary')              # sort by salary


array([(b'Nick', 5500., 41), (b'Kyle', 6500., 44), (b'Ken', 7500., 44)],
      dtype=[('name', 'S10'), ('salary', '<f8'), ('age', '<i4')])


np.sort(structured_array1, order=['age', 'salary'])     # sort by age, salary

array([(b'Nick', 5500., 41), (b'Kyle', 6500., 44), (b'Ken', 7500., 44)],
      dtype=[('name', 'S10'), ('salary', '<f8'), ('age', '<i4')])

Extraction of elements from a NumPy array is one of the most important activity any developer will encounter. There are various techniques like Subsetting, Slicing, Indexing, etc. Examples of these techniques are described below:

a) Subsetting: In this technique, a subset of the array is extracted and can be a single member or may have more members

numpy_array17 = np.random.randint(100, size=(5, 5))
numpy_array17

array([[ 9, 12, 28, 68, 76],
       [74, 58, 37, 39, 46],
       [46, 15, 46, 24, 34],
       [33, 41, 53, 35, 30],
       [49, 78, 86, 57, 38]])

numpy_array17[0]          #Extracts first row

array([ 9, 12, 28, 68, 76])

numpy_array17[:,0]        #Extracts first column

array([ 9, 74, 46, 33, 49])

numpy_array17[2,2]       #Extracts single element

46

b) Slicing: In this technique, a slice consisting of one or more members is extracted

Syntax for Slicing is [lower:upper:step] where lower bound is included but upper bound is not included. step specifies stride between elements and is 1 by default, if unspecified. The first element in a single dimension array is 0 in the forward direction and is -1 for the last element in the reverse direction. Some examples are shown below:

numpy_array18 = np.array([10,11,12,13,14])

numpy_array18[1:3]

array([11, 12])

numpy_array18[-4:3]

array([11, 12])

numpy_array18[:3]   # more like selecting head

array([10, 11, 12])

numpy_array18[-2:]  # more like selecting tail

array([13, 14])

numpy_array18[::2]

array([10, 12, 14])

numpy_array18[::-1]  # Reversing the array


array([14, 13, 12, 11, 10])

c) Indexing using boolean indices: In this technique, a boolean array is used like shown below:

numpy_array18 = np.array([10,11,12,13,14])

numpy_array18[numpy_array18 >= 13]

 array([13, 14])

d) Fancy indexing: Lastly, we have the fancy indexing where we have the capability to select complex subsets and also modify them using assignment

rand = np.random.RandomState(1)
numpy_array19 = rand.randint(100, size=10)
print(
numpy_array19)

[37 12 72  9 75  5 79 64 16  1]

[numpy_array19[3], numpy_array19[7], numpy_array19[2]]

[9, 64, 72]

Alternatively, we can pass a single list or array of indices to obtain the same result:

ind = [3, 7, 4]

numpy_array19[ind]

array([ 9, 64, 75])

When using fancy indexing, the shape of the result reflects the shape of the index arrays rather than the shape of the array being indexed:

indices = np.array([[3, 7],
                [4, 5]])

numpy_array19[indices]


array([[ 9, 64],
       [75,  5]])


Fancy indexing also works in multiple dimensions. Consider the following array:

numpy_array20 = np.arange(12).reshape((3, 4))
numpy_array20


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])


Like with standard indexing, the first index refers to the row, and the second to the column:

row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
numpy_array20[row, col]


array([ 2,  5, 11])

We can modify the array as shown below:

row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
numpy_array20[row, col] = -1
numpy_array20

array([[ 0,  1, -1,  3],
       [ 4, -1,  6,  7],
       [ 8,  9, 10, -1]])

A few other operations are described below:

a) Changing array shape:

numpy_array21 = np.arange(24).reshape((2,2,2,3))
numpy_array21

array([[[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]],


       [[[12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20],
         [21, 22, 23]]]])


numpy_array21 = numpy_array21.ravel()   # Flatten the array
numpy_array21


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])


numpy_array21.reshape((2,2,2,3))         # Reshape the array

array([[[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]],


       [[[12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20],
         [21, 22, 23]]]])
 

b)  Add and remove members:

numpy_array22=np.array([[0,1],[2,3]])
numpy_array22


array([[0, 1],
       [2, 3]])

np.resize(numpy_array22,(2,3))

array([[0, 1, 2],
       [3, 0, 1]])

Above step returns a new array with the specified shape. If the shape of the new array is larger than the original array, then the new array is filled with repeated copies of the original array

np.append(numpy_array22,numpy_array22)   # Append items to an array

array([0, 1, 2, 3, 0, 1, 2, 3])

np.insert(numpy_array22, 1, 5)   #Insert values along the given axis before the given indices

array([0, 5, 1, 2, 3])


np.insert(numpy_array22, 1, 5, axis=1)

array([[0, 5, 1],
       [2, 5, 3]])


np.delete(numpy_array22, 1, 0)  # Return a new array with sub-arrays along an axis deleted

array([[0, 1]])


np.delete(numpy_array22, 1, 1)  # Return a new array with sub-arrays along an axis deleted

array([[0],
       [2]])
 

c) Combining arrays:

numpy_array23 = np.array([[1, 1], [2, 2], [3, 3]])
numpy_array23


array([[1, 1],
       [2, 2],
       [3, 3]]) 

np.concatenate((numpy_array23,numpy_array23),axis=0) 

array([[1, 1],
       [2, 2],
       [3, 3],
       [1, 1],
       [2, 2],
       [3, 3]])

Above steps shows a joining a sequence of arrays along an existing axis

np.concatenate((numpy_array23,numpy_array23),axis=1)

array([[1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3]])


np.concatenate((numpy_array23, numpy_array23), axis=None)

array([1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3])

numpy_array24 = np.array([1, 2, 3])
numpy_array25 = np.array([2, 3, 4])


np.vstack((numpy_array24, numpy_array25))     # Stack arrays vertically (row-wise)

array([[1, 2, 3],
       [2, 3, 4]])


np.hstack((numpy_array24, numpy_array25))   # stack arrays in sequence horizontally (column wise)

array([1, 2, 3, 2, 3, 4])

np.column_stack((numpy_array24, numpy_array25))   # Stack 1-D arrays as columns into a 2-D array

array([[1, 2],
       [2, 3],
       [3, 4]])

d) Splitting of arrays: Arrays can be split using hsplit (horizontal split) and vsplit (vertical split) as shown below:

numpy_array26 = np.arange(16.0).reshape(4, 4)
numpy_array26


array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])


np.hsplit(numpy_array26, 2) # Split an array into multiple sub-arrays horizontally (column-wise)

[array([[ 0.,  1.],
        [ 4.,  5.],
        [ 8.,  9.],
        [12., 13.]]),
 array([[ 2.,  3.],
        [ 6.,  7.],
        [10., 11.],
        [14., 15.]])]


np.hsplit(numpy_array26, 4)

[array([[ 0.],
        [ 4.],
        [ 8.],
        [12.]]),
 array([[ 1.],
        [ 5.],
        [ 9.],
        [13.]]),
 array([[ 2.],
        [ 6.],
        [10.],
        [14.]]),
 array([[ 3.],
        [ 7.],
        [11.],
        [15.]])]


np.vsplit(numpy_array26, 2)   # Split an array into multiple sub-arrays vertically (row-wise)

[array([[0., 1., 2., 3.],
        [4., 5., 6., 7.]]),
 array([[ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])]


np.vsplit(numpy_array26, 4)

[array([[0., 1., 2., 3.]]),
 array([[4., 5., 6., 7.]]),
 array([[ 8.,  9., 10., 11.]]),
 array([[12., 13., 14., 15.]])]

With this we have nearly covered all basic aspects of Numpy arrays. This concludes the posts on Numpy