본문 바로가기
Finance

High-Speed Scientific Computing Using

by 자동매매 2023. 3. 20.

3

High-Speed Scientific Computing Using

NumPy

is chapter introduces us to NumPy, a high-speed Python library for matrix calculations. Most data science/algorithmic trading libraries are built upon NumPy’s functionality and conventions.

In this chapter, we are going to cover the following key topics:

Introduction to NumPy

Creating NumPy n-dimensional arrays (ndarrays)

Data types used with NumPy arrays

Indexing of ndarrays

Basic ndarray operations

File operations on ndarrays

46 High-Speed Scientific Computing Using NumPy

Technical requirements

e Python code used in this chapter is available in the Chapter03/numpy.ipynb notebook in the book’s code repository.

Introduction to NumPy

Multidimensional heterogeneous arrays can be represented in Python using lists. A list is a 1D array, a list of lists is a 2D array, a list of lists of lists is a 3D array, and so on. However, this solution is complex, difficult to use, and extremely slow.

One of the primary design goals of the NumPy Python library was to introduce high-performant and scalable structured arrays and vectorized computations.

Most data structures and operations in NumPy are implemented in C/C++, which guarantees their superior speed.

Creating NumPy ndarrays

An ndarray is an extremely high-performant and space-efficient data structure for multidimensional arrays.

First, we need to import the NumPy library, as follows:

import numpy as np

Next, we will start creating a 1D ndarray.

Creating 1D ndarrays

e following line of code creates a 1D ndarray:

arr1D = np.array([1.1, 2.2, 3.3, 4.4, 5.5]); arr1D

is will give the following output:

array([1.1, 2.2, 3.3, 4.4, 5.5])

Let’s inspect the type of the array with the following code:

type(arr1D)

Creating NumPy ndarrays 47

is shows that the array is a NumPy ndarray, as can be seen here:

numpy.ndarray

We can easily create ndarrays of two dimensions or more.

Creating 2D ndarrays

To create a 2D ndarray, use the following code:

arr2D = np.array([[1, 2], [3, 4]]); arr2D

e result has two rows and each row has two values, so it is a 2 x 2 ndarray, as illustrated in the following code snippet:

array([[1, 2], [3, 4]])

Creating any-dimension ndarrays

An ndarray can construct arrays with arbitrary dimensions. e following code creates an ndarray of 2 x 2 x 2 x 2 dimensions:

arr4D = np.array(range(16)).reshape((2, 2, 2, 2)); arr4D

e representation of the array is shown here:

array([[[[ 0, 1],

[ 2, 3]],

[[ 4, 5],

[ 6, 7]]], [[[ 8, 9],

[10, 11]],

[[12, 13],

[14, 15]]]])

NumPy ndarrays have a shape attribute that describes the ndarray’s dimensions, as shown in the following code snippet:

arr1D.shape

48 High-Speed Scientific Computing Using NumPy

e following snippet shows that arr1D is a one-dimensional array with five elements:

(5,)

We can inspect the shape attribute on arr2D with the following code:

arr2D.shape

As expected, the output describes it as being a 2 x 2 ndarray, as we can see here:

(2, 2)

In practice, there are certain matrices that are more frequently used, such as a matrix of 0s, a matrix of 1s, an identity matrix, a matrix containing a range of numbers, or a random matrix. NumPy provides support for generating these frequently used ndarrays with one command.

Creating an ndarray with np.zeros(...)

e np.zeros(...) method creates an ndarray populated with all 0s, as illustrated in the following code snippet:

np.zeros(shape=(2,5))

e output is all 0s, with dimensions being 2 x 5, as illustrated in the following code snippet:

array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])

Creating an ndarray with np.ones(...)

np.ones(...) is similar, but each value is assigned a value of 1 instead of 0. e method is shown in the following code snippet:

np.ones(shape=(2,2))

e result is a 2 x 2 ndarray with every value set to 1, as illustrated in the following code snippet:

array([[1., 1.], [1., 1.]])

Creating NumPy ndarrays 49

Creating an ndarray with np.identity(...)

O en in matrix operations we need to create an identity matrix, which is available in the np.identity(...) method, as illustrated in the following code snippet:

np.identity(3)

is creates a 3 x 3 identity matrix with 1s on the diagonals and 0s everywhere else, as illustrated in the following code snippet:

array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])

Creating an ndarray with np.arange(...)

np.arange(...) is the NumPy equivalent of the Python range(...) method. is generates values with a start value, end value, and increment, except this returns NumPy ndarrays instead, as shown here:

np.arange(5)

e ndarray returned is shown here:

array([0, 1, 2, 3, 4])

By default, values start at 0 and increment by 1.

Creating an ndarray with np.random.randn(…)

np.random.randn(…) generates an ndarray of specified dimensions, with each element populated with random values drawn from a standard normal distribution (mean=0, std=1), as illustrated here:

np.random.randn(2,2)

e output is a 2 x 2 ndarray with random values, as illustrated in the following code snippet:

array([[ 0.57370365, -1.22229931], [-1.25539335, 1.11372387]])

50 High-Speed Scientific Computing Using NumPy

Data types used with NumPy ndarrays

NumPy ndarrays are homogenous that is, each element in an ndarray has the same data type. is is different from Python lists, which can have elements with different data types (heterogenous).

e np.array(...) method accepts an explicit dtype= parameter that lets us specify the data type that the ndarray should use. Common data types used are np.int32, np.float64, np.float128, and np.bool. Note that np.float128 is not

supported on Windows.

e primary reason why you should be conscious about the various numeric types for ndarrays is the memory usage the more precision the data type provides, the larger

memory requirements it has. For certain operations, a smaller data type may be just enough.

Creating a numpy.float64 array

To create a 128-bit floating-values array, use the following code:

np.array([-1, 0, 1], dtype=np.float64)

e output is shown here:

array([-1., 0., 1.], dtype=float64)

Creating a numpy.bool array

We can create an ndarray by converting specified values to the target type. In the following code example, we see that even though integer data values were provided, the resulting ndarray has dtype as bool, since the data type was specified to be np.bool:

np.array([-1, 0, 1], dtype=np.bool)

e values are shown here:

array([ True, False, True])

We observe that the integer values (-1, 0, 1) were converted to bool values (True, False, True). 0 gets converted to False, and all other values get converted to True.

Indexing of ndarrays 51

ndarrays' dtype attribute

ndarrays have a dtype attribute to inspect the data type, as shown here:

arr1D.dtype

e output is a NumPy dtype object with a float64 value, as illustrated here:

dtype('float64')

Converting underlying data types of ndarray with numpy.ndarrays.astype(...)

We can easily convert the underlying data type of an ndarray to any other compatible data type with the numpy.ndarrays.astype(...) method. For example, to convert arr1D from np.float64 to np.int64, we use the following code:

arr1D.astype(np.int64).dtype

is reflects the new data type, as follows:

dtype('int64')

When numpy.ndarray.astype(...) converts to a narrower data type, it will truncate the values, as follows:

arr1D.astype(np.int64)

is converts arr1D to the following integer-valued ndarray:

array([1, 2, 3, 4, 5])

e original floating values (1.1, 2.2, ) are converted to their truncated integer values (1, 2, ).

Indexing of ndarrays

Array indexing refers to the way of accessing a particular array element or elements. In NumPy, all ndarray indices are zero-based that is, the first item of an array has index 0. Negative indices are understood as counting from the end of the array.

52 High-Speed Scientific Computing Using NumPy

Direct access to an ndarray's element

Direct access to a single ndarray’s element is one of the most used forms of access. e following code builds a 3 x 3 random-valued ndarray for our use:

arr = np.random.randn(3,3); arr

e arr ndarray has the following elements:

array([[-0.04113926, -0.273338 , -1.05294723], [ 1.65004669, -0.09589629, 0.15586867], [ 0.39533427, 1.47193681, 0.32148741]])

We can index the first element with integer index 0, as follows:

arr[0]

is gives us the first row of the arr ndarray, as follows:

array([-0.04113926, -0.273338 , -1.05294723])

We can access the element at the second column of the first row by using the following code:

arr[0][1]

e result is shown here: -0.2733379996693689

ndarrays also support an alternative notation to perform the same operation, as illustrated here:

arr[0, 1]

It accesses the same element as before, as can be seen here:

-0.2733379996693689

e numpy.ndarray[index_0, index_1, … index_n] notation is especially more concise and useful when accessing ndarrays with very large dimensions.

Indexing of ndarrays 53

Negative indices start from the end of the ndarray, as illustrated here:

arr[-1]

is returns the last row of the ndarray, as follows:

array([0.39533427, 1.47193681, 0.32148741])

ndarray slicing

While single ndarray access is useful, for bulk processing we require access to multiple elements of the array at once (for example, if the ndarray contains all daily prices of an asset, we might want to process only all Mondays’ prices).

Slicing allows access to multiple ndarray records in one command. Slicing ndarrays also works similarly to slicing of Python lists.

e basic slice syntax is i:j:k, where i is the index of the first record we want to include, j is the stopping index, and k is the step.

Accessing all ndarray elements after the first one

To access all elements a er the first one, we can use the following code:

arr[1:]

is returns all the rows a er the first one, as illustrated in the following code snippet:

array([[ 1.65004669, -0.09589629, 0.15586867], [ 0.39533427, 1.47193681, 0.32148741]])

Fetching all rows, starting from row 2 and columns 1 and 2

Similarly, to fetch all rows starting from the second one, and columns up to but not including the third one, run the following code:

arr[1:, :2]

is is a 2 x 2 ndarray as expected, as can be seen here:

array([[ 1.65004669, -0.09589629], [ 0.39533427, 1.47193681]])

54 High-Speed Scientific Computing Using NumPy

Slicing with negative indices

More complex slicing notation that mixes positive and negative index ranges is also possible, as follows:

arr[1:2, -2:-1]

is is a less intuitive way of finding the slice of an element at the second row and at the second column, as illustrated here:

array([[-0.09589629]])

Slicing with no indices

Slicing with no indices yields the entire row/column. e following code generates a slice containing all elements on the third row:

arr[:][2]

e output is shown here:

array([0.39533427, 1.47193681, 0.32148741]) e following code generates a slice of the original arr ndarray:

arr[:][:]

e output is shown here:

array([[-0.04113926, -0.273338 , -1.05294723], [ 1.65004669, -0.09589629, 0.15586867], [ 0.39533427, 1.47193681, 0.32148741]])

Setting values of a slice to 0

Frequently, we will need to set certain values of an ndarray to a given value.

Let’s generate a slice containing the second row of arr and assign it to a new variable, arr1, as follows:

arr1 = arr[1:2]; arr1

Indexing of ndarrays 55

arr1 now contains the last row, as shown in the following code snippet:

array([[ 1.65004669, -0.09589629, 0.15586867]]) Now, let’s set every element of arr1 to the value 0, as follows:

arr1[:] = 0; arr1

As expected, arr1 now contains all 0s, as illustrated here:

array([[0., 0., 0.]])

Now, let’s re-inspect our original arr ndarray, as follows:

arr

e output is shown here:

array([[-0.04113926, -0.273338 , -1.05294723], [ 0. , 0. , 0. ], [ 0.39533427, 1.47193681, 0.32148741]])

We see that our operation on the arr1 slice also changed the original arr ndarray. is brings us to the most important point: ndarray slices are views into the original ndarrays, not copies.

It is important to remember this when working with ndarrays so that we do not inadvertently change something we did not mean to. is design is purely for efficiency reasons, since copying large ndarrays incurs large overheads.

To create a copy of an ndarray, we explicitly call the numpy.ndarray.copy(...) method, as follows:

arr_copy = arr.copy()

Now, let’s change some values in the arr_copy ndarray, as follows:

arr_copy[1:2] = 1; arr_copy

56 High-Speed Scientific Computing Using NumPy

We can see the change in arr_copy in the following code snippet:

array([[-0.04113926, -0.273338 , -1.05294723], [ 1. , 1. , 1. ], [ 0.39533427, 1.47193681, 0.32148741]])

Let’s inspect the original arr ndarray as well, as follows:

arr

e output is shown here:

array([[-0.04113926, -0.273338 , -1.05294723], [ 0. , 0. , 0. ], [ 0.39533427, 1.47193681, 0.32148741]])

We see that the original ndarray is unchanged since arr_copy is a copy of arr and not a reference/view to it.

Boolean indexing

NumPy provides multiple ways of indexing ndarrays. NumPy arrays can be indexed by using conditions that evaluate to True or False. Let’s start by regenerating an arr ndarray, as follows:

arr = np.random.randn(3,3); arr

is is a 3 x 3 ndarray with random values, as can be seen in the following code snippet:

array([[-0.50566069, -0.52115534, 0.0757591 ], [ 1.67500165, -0.99280199, 0.80878346], [ 0.56937775, 0.36614928, -0.02532004]])

Let’s revisit the output of running the following code, which is really just calling the np.less(...) universal function (ufunc) that is, the result of the following code is identical to calling the np.less(arr, 0)) method:

arr < 0

Indexing of ndarrays 57

is generates another ndarray of True and False values, where True means the corresponding element in arr was negative and False means the corresponding element in arr was not negative, as illustrated in the following code snippet:

array([[ True, True, False], [False, True, False], [False, False, True]])

We can use that array as an index to arr to find the actual negative elements, as follows:

arr[(arr < 0)]

As expected, this fetches the following negative values:

array([-0.50566069, -0.52115534, -0.99280199, -0.02532004])

We can combine multiple conditions with & (and) and | (or) operators. Python’s & and | Boolean operators do not work on ndarrays since they are for scalars. An example of a &

operator is shown here: (arr > -1) & (arr < 1)

is generates an ndarray with the value True, where the elements are between -1 and 1 and False otherwise, as illustrated in the following code snippet:

array([[ True, True, True], [False, True, True], [ True, True, True]])

As we saw before, we can use that Boolean array to index arr and find the actual

elements, as follows:

arr[((arr > -1) & (arr < 1))]

e following output is an array of elements that satisfied the condition:

array([-0.50566069, -0.52115534, 0.0757591 , -0.99280199, 0.80878346,

0.56937775, 0.36614928, -0.02532004])

58 High-Speed Scientific Computing Using NumPy

Indexing with arrays

ndarray indexing also allows us to directly pass lists of indices of interest. Let’s first generate an ndarray of random values to use, as follows:

arr

e output is shown here:

array([[-0.50566069, -0.52115534, 0.0757591 ], [ 1.67500165, -0.99280199, 0.80878346], [ 0.56937775, 0.36614928, -0.02532004]])

We can select the first and third rows, using the following code:

arr[[0, 2]]

e output is a 2 x 3 ndarray containing the two rows, as illustrated here:

array([[-0.50566069, -0.52115534, 0.0757591 ], [ 0.56937775, 0.36614928, -0.02532004]])

We can combine row and column indexing using arrays, as follows:

arr[[0, 2], [1]]

e preceding code gives us the second column of the first and third rows, as follows:

array([-0.52115534, 0.36614928])

We can also change the order of the indices passed, and this is reflected in the output. e following code picks out the third row followed by the first row, in that order:

arr[[2, 0]]

e output reflects the two rows in the order we expected (third row first; first row second), as illustrated in the following code snippet:

array([[ 0.56937775, 0.36614928, -0.02532004], [-0.50566069, -0.52115534, 0.0757591 ]])

Now that we have learned how to create ndarrays and about the various ways to retrieve the values of their elements, let’s discuss the most common ndarray operations.

Basic ndarray operations 59

Basic ndarray operations

In the following examples, we will use an arr2D ndarray, as illustrated here:

arr2D

is is a 2 x 2 ndarray with values from 1 to 4, as shown here:

array([[1, 2], [3, 4]])

Scalar multiplication with an ndarray

Scalar multiplication with an ndarray has the effect of multiplying each element of the ndarray, as illustrated here:

arr2D * 4

e output is shown here:

array([[ 4, 8], [12, 16]])

Linear combinations of ndarrays

e following operation is a combination of scalar and ndarray operations, as well as operations between ndarrays:

2*arr2D + 3*arr2D

e output is what we would expect, as can be seen here:

array([[ 5, 10], [15, 20]])

Exponentiation of ndarrays

We can raise each element of the ndarray to a certain power, as illustrated here:

arr2D ** 2

60 High-Speed Scientific Computing Using NumPy

e output is shown here:

array([[ 1, 4], [ 9, 16]])

Addition of an ndarray with a scalar

Addition of an ndarray with a scalar works similarly, as illustrated here:

arr2D + 10

e output is shown here:

array([[11, 12], [13, 14]])

Transposing a matrix

Finding the transpose of a matrix, which is a common operation, is possible in NumPy with the numpy.ndarray.transpose(...) method, as illustrated in the following code snippet:

arr2D.transpose()

is transposes the ndarray and outputs it, as follows:

array([[1, 3], [2, 4]])

Changing the layout of an ndarray

e np.ndarray.reshape(...) method allows us to change the layout (shape) of the ndarray without changing its data to a compatible shape.

For instance, to reshape arr2D from 2 x 2 to 4 x 1, we use the following code:

arr2D.reshape((4, 1))

Basic ndarray operations 61

e new reshaped 4 x 1 ndarray is displayed here:

array([[1], [2], [3], [4]])

e following code example combines np.random.randn(...) and np.ndarray. reshape(...) to create a 3 x 3 ndarray of random values:

arr = np.random.randn(9).reshape((3,3)); arr

e generated 3 x 3 ndarray is shown here:

array([[ 0.24344963, -0.53183761, 1.08906941], [-1.71144547, -0.03195253, 0.82675183], [-2.24987291, 2.60439882, -0.09449784]])

Finding the minimum value in an ndarray

To find the minimum value in an ndarray, we use the following command:

np.min(arr)

e result is shown here:

-2.249872908111852

Calculating the absolute value

e np.abs(...) method, shown here, calculates the absolute value of an ndarray:

np.abs(arr)

e output ndarray is shown here:

array([[0.24344963, 0.53183761, 1.08906941], [1.71144547, 0.03195253, 0.82675183], [2.24987291, 2.60439882, 0.09449784]])

62 High-Speed Scientific Computing Using NumPy

Calculating the mean of an ndarray

e np.mean(...) method, shown here, calculates the mean of all elements in the ndarray:

np.mean(arr)

e mean of the elements of arr is shown here:

0.01600703714906236

We can find the mean along the columns by specifying the axis= parameter, as follows:

np.mean(arr, axis=0)

is returns the following array, containing the mean for each column:

array([-1.23928958, 0.68020289, 0.6071078 ])

Similarly, we can find the mean along the rows by running the following code:

np.mean(arr, axis=1)

at returns the following array, containing the mean for each row:

array([ 0.26689381, -0.30554872, 0.08667602])

Finding the index of the maximum value in an ndarray

O en, we’re interested in finding where in an array its largest value is. e np.argmax(...) method finds the location of the maximum value in the ndarray, as follows:

np.argmax(arr)

is returns the following value, to represent the location of the maximum value (2.60439882):

7

e np.argmax(...) method also accepts the axis= parameter to perform the operation row-wise or column-wise, as illustrated here:

np.argmax(arr, axis=1)

Basic ndarray operations 63

is finds the location of the maximum value on each row, as follows:

array([2, 2, 1], dtype=int64)

Calculating the cumulative sum of elements of an ndarray

To calculate the running total, NumPy provides the np.cumsum(...) method. e np.cumsum(...) method, illustrated here, finds the cumulative sum of elements in the ndarray:

np.cumsum(arr)

e output provides the cumulative sum a er each additional element, as follows:

array([ 0.24344963, -0.28838798, 0.80068144, -0.91076403, -0.94271656,

-0.11596474, -2.36583764, 0.23856117, 0.14406333])

Notice the difference between a cumulative sum and a sum. A cumulative sum is an array of a running total, whereas a sum is a single number.

Applying the axis= parameter to the cumsum method works similarly, as illustrated in the following code snippet:

np.cumsum(arr, axis=1)

is goes row-wise and generates the following array output:

array([[ 0.24344963, -0.28838798, 0.80068144], [-1.71144547, -1.743398 , -0.91664617], [-2.24987291, 0.35452591, 0.26002807]])

Finding NaNs in an ndarray

Missing or unknown values are o en represented in NumPy using a Not a Number (NaN) value. For many numerical methods, these must be removed or replaced with an interpolation.

First, let’s set the second row to np.nan, as follows:

arr[1, :] = np.nan; arr

64 High-Speed Scientific Computing Using NumPy

e new ndarray has the NaN values, as illustrated in the following code snippet:

array([[ 0.64296696, -1.35386668, -0.63063743], [ nan, nan, nan], [-0.19093967, -0.93260398, -1.58520989]])

e np.isnan(...) ufunc finds if values in an ndarray are NaNs, as follows:

np.isnan(arr)

e output is an ndarray with a True value where NaNs exist and a False value where NaNs do not exist, as illustrated in the following code snippet:

array([[False, False, False], [ True, True, True], [False, False, False]])

Finding the truth values of x1>x2 of two ndarrays

Boolean ndarrays are an efficient way of obtaining indices for values of interest. Using Boolean ndarrays is far more performant than looping over the matrix elements one by one.

Let’s build another arr1 ndarray with random values, as follows:

arr1 = np.random.randn(9).reshape((3,3));

arr1

e result is a 3 x 3 ndarray, as illustrated in the following code snippet:

array([[ 0.32102068, -0.51877544, -1.28267292], [-1.34842617, 0.61170993, -0.5561239 ], [ 1.41138027, -2.4951374 , 1.30766648]])

Similarly, let’s build another arr2 ndarray, as follows:

arr2 = np.random.randn(9).reshape((3,3)); arr2

e output is shown here:

array([[ 0.33189432, 0.82416396, -0.17453351], [-1.59689203, -0.42352094, 0.22643589], [-1.80766151, 0.26201455, -0.08469759]])

Basic ndarray operations 65

e np.greater(...) function is a binary ufunc that generates a True value when the le -hand-side value in the ndarray is greater than the right-hand-side value in the ndarray. is function can be seen here:

np.greater(arr1, arr2)

e output is an ndarray of True and False values as described previously, as we can see here:

array([[False, False, False], [ True, True, False], [ True, False, True]])

e > infix operator, shown in the following snippet, is a shorthand of numpy.greater(...):

arr1 > arr2

e output is the same, as we can see here:

array([[False, False, False], [ True, True, False], [ True, False, True]])

any and all Boolean operations on ndarrays

In addition to relational operators, NumPy supports additional methods for testing conditions on matrices’ values.

e following code generates an ndarray containing True for elements that satisfy the condition, and False otherwise:

arr_bool = (arr > -0.5) & (arr < 0.5); arr_bool

e output is shown here:

array([[False, False, True], [False, False, False], [False, True, True]])

66 High-Speed Scientific Computing Using NumPy

e following numpy.ndarray.any(...) method returns True if any element is True and otherwise returns False:

arr_bool.any()

Here, we have at least one element that is True, so the output is True, as shown here:

True

Again, it accepts the common axis= parameter and behaves as expected, as we can see here:

arr_bool.any(axis=1)

And the operation performed row-wise yields, as follows:

array([True, False, True])

e following numpy.ndarray.all(...) method returns True when all elements are True, and False otherwise:

arr_bool.all()

is returns the following, since not all elements are True:

False

It also accepts the axis= parameter, as follows:

arr_bool.all(axis=1)

Again, each row has at least one False value, so the output is False, as shown here:

array([False, False, False])

Sorting ndarrays

Finding an element in a sorted ndarray is faster than processing all elements of the ndarray. Let’s generate a 1D random array, as follows:

arr1D = np.random.randn(10); arr1D

Basic ndarray operations 67

e ndarray contains the following data:

array([ 1.14322028, 1.61792721, -1.01446969, 1.26988026, -0.20110113,

-0.28283051, 0.73009565, -0.68766388, 0.27276319, -0.7135162 ])

e np.sort(...) method is pretty straightforward, as can be seen here:

np.sort(arr1D)

e output is shown here:

array([-1.01446969, -0.7135162 , -0.68766388, -0.28283051, -0.20110113,

0.27276319, 0.73009565, 1.14322028, 1.26988026, 1.61792721])

Let’s inspect the original ndarray to see if it was modified by the numpy.sort(...)

operation, as follows:

arr1D

e following output shows that the original array is unchanged:

array([ 1.14322028, 1.61792721, -1.01446969, 1.26988026, -0.20110113,

-0.28283051, 0.73009565, -0.68766388, 0.27276319, -0.7135162 ])

e following np.argsort(...) method creates an array of indices that represent the location of each element in a sorted array:

np.argsort(arr1D)

e output of this operation generates the following array:

array([2, 9, 7, 5, 4, 8, 6, 0, 3, 1])

NumPy ndarrays have the numpy.ndarray.sort(...) method as well, which sorts arrays in place. is method is illustrated in the following code snippet:

arr1D.sort() np.argsort(arr1D)

68 High-Speed Scientific Computing Using NumPy

A er the call to sort(), we call numpy.argsort(...) to make sure the array was sorted, and this yields the following array that confirms that behavior:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Searching within ndarrays

Finding indices of elements where a certain condition is met is a fundamental operation on an ndarray.

First, we start with an ndarray with consecutive values, as illustrated here:

arr1 = np.array(range(1, 11)); arr1

is creates the following ndarray:

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

We create a second ndarray based on the first one, except this time the values in the second one are multiplied by 1000, as illustrated in the following code snippet:

arr2 = arr1 * 1000; arr2

en, we know arr2 contains the following data:

array([ 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000,

10000])

We define another ndarray that contains 10 True and False values randomly, as follows:

cond = np.random.randn(10) > 0;

cond

e values in the cond ndarray are shown here:

array([False, False, True, False, False, True, True, True, False, True])

File operations on ndarrays 69

e np.where(...) method allows us to select values from one ndarray or another, depending on the condition being True or False. e following code will generate an ndarray with a value picked from arr1 when the corresponding element in the cond array is True; otherwise, the value is picked from arr2:

np.where(cond, arr1, arr2) e returned array is shown here:

array([1000, 2000, 3, 4000, 5000, 6, 7, 8, 9000, )10])

File operations on ndarrays

Most NumPy arrays are read in from files and, a er processing, written out back to files.

File operations with text files

e key advantages of text files are that they are human-readable and compatible with any custom so ware.

Let’s start with the following random array:

arr

is array contains the following data:

array([[-0.50566069, -0.52115534, 0.0757591 ], [ 1.67500165, -0.99280199, 0.80878346], [ 0.56937775, 0.36614928, -0.02532004]])

e numpy.savetxt(...) method saves the ndarray to disk in text format. e following example uses a fmt='%0.2lf' format string and specifies a

comma delimiter:

np.savetxt('arr.csv', arr, fmt='%0.2lf', delimiter=',') Let’s inspect the arr.csv file written out to disk in the current directory, as follows:

!cat arr.csv

70 High-Speed Scientific Computing Using NumPy

e comma-separated values (CSV) file contains the following data:

-0.51,-0.52,0.08 1.68,-0.99,0.81 0.57,0.37,-0.03

e numpy.loadtxt(...) method loads an ndarray from text file to memory. Here, we explicitly specify the delimiter=',' parameter, as follows:

arr_new = np.loadtxt('arr.csv', delimiter=','); arr_new

And the ndarray read in from the text file contains the following data:

array([[-0.51, -0.52, 0.08], [ 1.68, -0.99, 0.81], [ 0.57, 0.37, -0.03]])

File operations with binary files

Binary files are far more efficient for computer processing they save and load more quickly and are smaller than text files. However, their format may not be supported by other so ware.

e numpy.save(...) method stores ndarrays in a binary format, as illustrated in the following code snippet:

np.save('arr', arr) !cat arr.npy

e output of the arr.npy file is shown here:

70 High-Speed Scientific Computing Using NumPy

e numpy.save(...) files it creates.

e numpy.load(...) reading binary files:

 method automatically assigns the .npy extension to binary method, shown in the following code snippet, is used for

70 High-Speed Scientific Computing Using NumPy

arr_new = np.load('arr.npy'); arr_new

Summary 71

e newly read-in ndarray is shown here:

array([[-0.50566069, -0.52115534, 0.0757591 ], [ 1.67500165, -0.99280199, 0.80878346], [ 0.56937775, 0.36614928, -0.02532004]])

Another advantage of having binary file formats is that data can be stored with extreme precision, especially when dealing with floating values, which is not always possible with text files since there is some loss of precision in certain cases.

Let’s check if the old arr ndarray and the newly read-in arr_new array match exactly, by running the following code:

arr == arr_new

is will generate the following array, containing True if the elements are equal and False otherwise:

array([[ True, True, True], [ True, True, True], [ True, True, True]])

So, we see that each element matches exactly.

Summary

In this chapter, we have learned how to create matrices of any dimension in Python,

how to access the matrices’ elements, how to calculate basic linear algebra operations on matrices, and how to save and load matrices.

Working with NumPy matrices is a principal operation for any data analysis since vector operations are machine-optimized and thus are much faster than operations on Python lists usually between 5 and 100 times faster. Backtesting any algorithmic strategy typically consists of processing enormous matrices, and then the speed difference can translate to hours or days of saved time.

In the next chapter, we introduce the second most important library for data analysis: Pandas, built upon NumPy. NumPy provides support for data manipulations based upon DataFrames (a DataFrame is the Python version of an Excel worksheet that is, a two-dimensional data structure where each column has its own type).

댓글