Numerical Python - numpy

4 분 소요

어떻게 행렬과 매트릭스를 코드로 표현할 것인가?

리스트로 방정식 표현하기

coefficients = [[2, 2, 1], [2, -1, 2], [1, -1, 2]]
constant = [9, 6, 5]

생겨나는 문제
- 다양한 matrix 계산을 어떻게 만들 것인가?
- 굉장히 큰 matrix에 대한 표현
- 처리 속도 문제(interpreter 언어)

파이썬 과학 처리 패키지 numpy

개요

고성능 과학 계산용 패키지
matrix와 vector와 같은 array 연산의 사실상의 표준

특징

일반 List에 비해 빠르고, 메모리 효율적
반복문 없이 데이터 배열에 대한 처리를 지원함
선형대수와 관련된 다양한 기능을 제공
C, C++, 포트란 등의 언어와 통합 가능

ndarray

import numpy as np

numpy 호출 방법, 세계적인 약속
arrray 생성

test_array = np.arrap([1, 4, 5, 8], float)

np.array를 활용해서 배열을 생성
numpy는 하나의 데이터 type만 배열에 넣을 수 있음
list와의 차이점 : Dynamic typing을 지원하지 않음
C의 Array를 사용하여 배열을 생성함

Why python is slow

numpy array는 값을 생성하면 차례대로 메모리의 공간에 할당이 됨
- 데이터가 붙어있음
- 연산이 빠름
python은 -5부터 256까지의 값이 메모리의 메모리의 static한 공간에 있음
- 그래서 값을 할당한다는 것은 그 주소값을 리스트에 저장한다는 뜻
- 단계가 하나가 더 생김
- 리스트의 변형은 쉬우나 연산이 느려짐

a = [1, 2, 3, 4, 5]
b = [5, 4, 3, 2, 1]
a is b # False
a[0] is b[-1] # True

a = np.array(a)
b = np.array(b)
a[0] is b[-1] # False

shape : numpy array의 dimensions을 반환함
type : numpy array의 데이터 타입을 반환

Rank	Name	Example
0	scalar	7
1	vector	[10,10]
2	matrix	[[10, 10], [15,15]]
3	3-tensor	[[[1, 5, 9], [2, 6, 10]], [[3, 7, 11], [4, 8, 12]]]
n	n-tensor

ndim : number of dimensions, rank

Handling shape

Reshape

Array의 shape의 크기를 변경함
element의 갯수는 동일

test_matrix = [[1, 2, 3, 4], [5, 6, 7, 8]]
print(np.array(test_matrix).shape)
# (2, 4)
print(test_matrix.shape)
# AttributeError: 'list' object has no attribute 'shape'


np.array(test_matrix).reshape(8,)
# [1, 2, 3, 4, 5, 6, 7, 8]
#(8,)

Comparisons

All & Any

a = np.arange(10)
np.any(a>5) # True
np.all(a>5) # False

comparison operation #1

numpy는 배열의 크기가 동일할 때 element-wise 비교 결과를 bool로 반환

a = np.array([-1, 0, 1, 2, 3, 4])
result = np.logical_and(a > 0, a < 3)
# [False False  True  True False False]
b = np.array([True, False, True, False])
np.logical_not(b)
# [False  True False  True]
c = np.array([False, False, True, True])
np.logical_or(b, c)
# [ True False  True  True]

np.where

배열 a의 조건에 따라 다른 값을 선택하는 연산
1. 조건
2. 조건이 참(True)일 때 선택할 값
3. 조건이 거짓(False)일 때 선택할 값

a = np.array([-1, 0, 1, 2, -2, 3])
result = np.where(a > 0, 3, 2)
# [2 2 3 3 2 3]

a = np.arange(10)
np.where(a>5)
# (array([6, 7, 8, 9]),)
a = np.arange(10,20,2)
print(a)
# [10 12 14 16 18]
print(np.where(a>15))
# (array([3, 4]),) 조건을 만족하는 인덱스 값을 반환

a = np.array([1, np.NaN, np.Inf], float)
# [ 1. nan inf]
np.isnan(a)
# [False  True False]
np.isfinite(a)
# [ True False False]

argmax & argmin

array 내 최댓값 또는 최솟값의 index를 반환함

a = np.array([1, 2, 3, 4, 5, 8, 78, 23, 3])
np.argmax(a), np.argmin(a)
(5, 0)

a = np.random.randint(100, size=(3, 4))
print(a)
# [[74  1 44 22]
# [87 49 55 37]
# [61 25 79 73]]
print(np.argmax(a), np.argmin(a)) # 배열을 1차원으로 평탄화 한 후 인덱스 반환
a = np.random.randint(100, size=(3, 4))
# [[49 92 70  0]
# [37 16 54 33]
# [ 3 48 77 33]]
print(np.argmax(a, axis=1), np.argmin(a, axis=0))
# [1 2 2] [2 1 1 0]
sorted_indices = a.argsort() # 오름차순으로 정렬
# 정렬된 인덱스 배열:
# [[3 0 2 1]
#  [1 3 0 2]
#  [0 3 1 2]]
sorted_a = np.array([row[idx] for row, idx in zip(a, sorted_indices)])
1. 첫 번째 루프: row = [49, 92, 70, 0], idx = [3, 0, 2, 1]
  - 정렬된 행: [row[3], row[0], row[2], row[1]] = [0, 49, 70, 92]
2. 두 번째 루프: row = [37, 16, 54, 33], idx = [1, 3, 0, 2]
  - 정렬된 행: [row[1], row[3], row[0], row[2]] = [16, 33, 37, 54]
3. 세 번째 루프: row = [3, 48, 77, 33], idx = [0, 3, 1, 2]
  - 정렬된 행: [row[0], row[3], row[1], row[2]] = [3, 33, 48, 77]
reverse_sorted_indices = a.argsort()[:, ::-1] # 2차원 배열일 떄
reverse_sorted_indices = a.argsort()[::-1] # 1차원 배열일 때

boolean & fancy index

boolean

특정 조건에 따른 값을 배열 형태로 추출
comparison operation 함수 모두 사용가능

test_array = np.array([1, 4, 0, 2, 3, 8, 9, 7], float)
test_array > 3
# array([False, True, False, False, False, True, True, True],dtype=bool)
test_array(test_array>3) # 조건이 True인 index의 element만 추출

fancy index

array를 index value로 사용해서 값 추출

a = np.array([2, 3, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)
a[b]
# array([2., 2., 3., 8., 6., 3., 2.])
a.take(b) # 같은 효과
# array([2., 2., 3., 8., 6., 3., 2.])

a = np.array([1, 4], [9, 16], float)
b = np.array([0, 0, 1, 1, 0], int)
c = np.array([0, 1, 1, 0, 1], int)
a[b, c]
# array([1., 4., 16., 9., 4.])

boolean index와 fancy index의 차이점

boolean index는 boolean list를 사용, fancy index는 integer list를 사용
boolean index는 원래 array와 shape이 같아야 한다.

numpy data i/o

loadtxt & savetxt

text type의 데이터를 일고, 저장하는 기능

a = np.loadtxt("file_name.txt")
a_int = a.astype(int)
np.savetxt('file_name.csv', a_int, fmt="%.2e", delimiter=",")
np.save("npy_test_object", arr=a) # np.save는 자동으로 npy 확장자로 저장

Twitter Facebook LinkedIn

gityeop

Numerical Python - numpy

리스트로 방정식 표현하기

파이썬 과학 처리 패키지 numpy

개요

특징

ndarray

Why python is slow

Handling shape

Reshape

Comparisons

All & Any

comparison operation #1

np.where

argmax & argmin

boolean & fancy index

boolean

fancy index

boolean index와 fancy index의 차이점

numpy data i/o

공유하기

댓글남기기

참고

LangChain 튜토리얼 가이드

Inverted Index란?

Datacentric-NLP-Wrap-up-Report

LASER Language-Agnostic Sentence Representations