Statistical Data Science
  • Home
  • Lab
    • Lab 00: Python Primer

On this page

  • Getting Started
    • print
    • Mutable and slice
  • Python Objects
  • Types and Operators
  • Functions and Methods
  • Modules
  • Flow Control
  • Iteration
    • Sets
    • Lists
    • Dictonary
  • Classes
  • Files
  • Numpy
    • Creating and shaping arrays
    • Slicing
    • Array Operations
    • Random numbers
  • Matplotlib
  • Pandas
    • Extracting Information
  • Scikit-learn
    • Partitioning the Data
    • Standardization
  • References

Python Primer

Author

Seoncheol Park

Getting Started

  • 파이썬

  • μ•„λ‚˜μ½˜λ‹€

  • VS Code

  • Positron

print

Mutable and slice

Python Objects

  • Each object has a list of attributes

  • Any attribute attr of an object obj can be accessed via the dot notation

Types and Operators

  • Each object has a type:
    • str, int, float, …
  • reference
  • Operator overloading: Operators such as + and * can be defined for other data types as well

Functions and Methods

  • Compute population mean and variance

Modules

  • λͺ¨λ“ˆ: κ΄€λ ¨ ν•¨μˆ˜, 클래슀, λ³€μˆ˜ 등을 κ·Έλ£Ήν™”ν•œ 파일(.py 파일)
    • datetime, matplotlib, numpy, os, pandas, …

Flow Control

  • while, for, if-else, …

Iteration

Sets

  • Python sets are unordered collections of unique objects {}

Lists

  • 리슀트: 쀑볡(값은 κ°’ μ—¬λŸ¬ 번 포함) κ°€λŠ₯, μˆœμ„œ 보μž₯ κ°€λŠ₯(μš”μ†Œλ“€μ΄ μΆ”κ°€λœ μˆœμ„œλŒ€λ‘œ μ €μž₯됨, 인덱슀둜 μ ‘κ·Όν•  수 μ—†μŒ), μˆ˜μ • κ°€λŠ₯(μš”μ†Œμ˜ μΆ”κ°€, μ‚­μ œ, λ³€κ²½ κ°€λŠ₯)

  • Set: μ€‘λ³΅λœ 값을 ν—ˆμš©ν•˜μ§€ μ•Šκ³ , μˆœμ„œλ₯Ό 보μž₯ν•˜μ§€ μ•Šκ³ (λ”°λΌμ„œ 인덱슀둜 μ ‘κ·Όν•  수 μ—†μŒ), μš”μ†Œμ˜ μΆ”κ°€ 및 μ‚­μ œκ°€ κ°€λŠ₯ν•˜λ‚˜ νŠΉμ • μœ„μΉ˜μ˜ μš”μ†Œλ₯Ό λ³€κ²½ν•˜λŠ” 것은 λΆˆκ°€λŠ₯

Dictonary

  • λ”•μ…”λ„ˆλ¦¬: key:value의 μ‘°ν•©μœΌλ‘œ 이루어짐, ν‚€λ₯Ό μ‚¬μš©ν•˜μ—¬ 값에 μ ‘κ·Ό, ν‚€λŠ” 쀑볡될 수 μ—†μ§€λ§Œ 값은 쀑볡될 수 있음

Classes

  • μƒˆλ‘œμš΄ 클래슀 λ§Œλ“œλŠ” μ˜ˆμ‹œ

Files

  • κ΅μž¬μ— μžˆλŠ” 예제 직접 ν•΄ λ³΄μ„Έμš”

Numpy

  • Numpy의 μ‚Όκ°ν•¨μˆ˜

Creating and shaping arrays

  • The fundamental data type in numpy is the ndarray.

  • Note that arange is numpy’s version of range, with the di↡erence that arange returns an ndarray object.

  • The dimension of an ndarray can be obtained via its shape method, which returns a tuple.

  • Arrays can be reshaped via the reshape method. This does not change the current ndarray object.

  • hstack and vstack: The arrays are joined horizontally and vertically, respectively.

Slicing

  • Arrays can be sliced similarly to Python lists.

  • If an array has several dimensions, a slice for each dimension needs to be specified.

  • ndarrays are mutable

Array Operations

  • Basic mathematical operators and functions act element-wise on ndarray objects.

  • Since version 3.5 of Python, it is possible to multiply two ndarrays using the @ operator (which implements the np.matmul method). For matrices, this is similar to using the dot method. For higher-dimensional arrays the two methods behave differently.

  • numpy allows arithmetic operations on arrays of different shapes (dimensions).

Random numbers

  • numpyμ—λŠ” randomμ΄λΌλŠ” sub-modules 쑴재

Matplotlib

  • 산점도 예제

Pandas

  • pandas: DataFrame 클래슀λ₯Ό ν¬ν•¨ν•˜μ—¬ 데이터 ꡬ성과 뢄석에 ν•„μš”ν•œ λ‹€μ–‘ν•œ 툴 제곡

Extracting Information

  • The apply method allows one to apply general functions to columns or rows of a DataFrame.

  • The loc method allows for accessing elements (or ranges) in a data frame.

  • count: Counts number of non-NA cells.

  • The groupby method of a DataFrame object is useful for summarizing and displaying the data in manipulated ways.

  • mean: Column/row mean.

Scikit-learn

Partitioning the Data

  • train_test_split ν•¨μˆ˜λ‘œ 자료 λΆ„ν•  κ°€λŠ₯

Standardization

  • MinMaxScaler, StandardScaler ν™œμš© κ°€λŠ₯

References

  • Data Science and Machine Learning
 

Copyright 2025, Seoncheol Park