데이터는

AI HUB 운전자 탑승자 상태 및 이상행동 모니터링

AI HUB 데이터가 커서 공유하는 과정이 고통이었음.. 🥲

🚫

하나의 큰 tar 파일들로 공유가 되는데 구글 드라이브에 올린 다음 코랩으로 압축해제 하는 게 안됨(너무 많이 접근해서 그런다나..)

로컬에서 100GB → 여러 tar로 분할 → 구글 드라이브 업로드

tar를 tar 안 최상위 디렉토리마다 작은 tar로 분할하는 코드

tar -tf TS4.tar | sed 's|/.*||' | sort -u > top_dirs_4.txt

bash

# 로그 파일 지정
LOGFILE="log.txt"
# 표준 출력(stdout)과 표준 에러(stderr)를 모두 log.txt로 리다이렉션
exec > "$LOGFILE" 2>&1

"$LOGFILE" 2>&1

echo "===== 스크립트 시작: $(date) ====="

# top_dirs.txt 파일에 저장된 최상위 디렉토리 목록을 이용하여 처리
# BSD tar 환경(예: macOS 기본 tar)에서 동작하도록 옵션 순서를 조정함

while IFS= read -r dir; do
  echo "처리 중: $dir"

  # 임시 추출 디렉토리 생성
  TEMP_DIR=$(mktemp -d)
  echo "  임시 디렉토리: $TEMP_DIR"

  # 1) TS5.tar에서 해당 최상위 디렉토리만 임시 폴더로 추출 (mp4 파일 제외)
  #    BSD tar는 -f, -C 등 옵션 순서가 중요하므로 아래처럼 명시적으로 작성
  echo "  '$dir/' 디렉토리 추출 중 (mp4 제외)..."
  tar -xv \
      -f TS4.tar \
      -C "$TEMP_DIR" \
      "$dir/"

  # 2) 추출 결과 확인
  if [ ! -d "$TEMP_DIR/$dir" ]; then
    echo "  ERROR: '$dir' 디렉토리가 TEMP_DIR에 존재하지 않습니다."
    echo "  TEMP_DIR 내용:"
    ls -la "$TEMP_DIR"
    rm -rf "$TEMP_DIR"
    continue
  fi

  # 3) 추출된 디렉토리를 기준으로 새 tar 파일 생성
  OUTPUT_TAR="${dir}.tar"
  echo "  '$OUTPUT_TAR' 파일 생성 중..."
  tar -cv \
      -f "$OUTPUT_TAR" \
      -C "$TEMP_DIR" \
      "$dir"

  # 4) 임시 디렉토리 삭제
  rm -rf "$TEMP_DIR"
  echo "  '$dir' 완료."

done < top_dirs_4.txt

echo "===== 스크립트 종료: $(date) ====="

bash

코랩에서 작은 용량의 tar 압축 해제 → 구글 드라이브에 저장

코랩 코드

import glob
import subprocess
import os
import shutil
from google.colab import drive
from tqdm import tqdm

# ✅ Google Drive가 언마운트되면 자동으로 다시 마운트하는 함수
def remount_drive():
    print("🔄 Google Drive 연결이 끊어졌습니다. 다시 마운트 중...")
    drive.flush_and_unmount()
    drive.mount('/content/drive')

# ✅ 압축 파일이 위치한 디렉토리 설정 (Google Drive)
drive_tar_dir = "/content/drive/MyDrive/DMS/data/AI-HUB/Training/원천데이터/TS4/tar/"
tar_pattern = drive_tar_dir + "*.tar"

# ✅ Colab 로컬 저장소에서 압축 해제할 경로
local_extract_dir = "/content/temp_extracted/"
os.makedirs(local_extract_dir, exist_ok=True)

# ✅ Colab에서 압축 해제할 `.tar` 파일 저장 경로
local_tar_dir = "/content/temp_tar/"
os.makedirs(local_tar_dir, exist_ok=True)

# ✅ 압축 해제 후 최종 저장할 Google Drive 경로
drive_destination_dir = "/content/drive/MyDrive/DMS/data/AI-HUB/Training/원천데이터/TS4"

# ✅ tar 파일 목록 가져오기
tar_files = glob.glob(tar_pattern)
total_files = len(tar_files)

print(f"📂 총 {total_files}개의 tar 파일을 찾았습니다.")

# ✅ 각 tar 파일을 개별적으로 추출하며 진행률 표시
for i, archive in enumerate(tqdm(tar_files, desc="Extracting TAR files", unit="file")):
    print(f"\n[{i+1}/{total_files}] Google Drive에서 로컬로 파일 복사 중: {archive}")

    # ✅ Google Drive에서 Colab 로컬로 tar 파일 복사
    local_tar_file = os.path.join(local_tar_dir, os.path.basename(archive))

    try:
        shutil.copy2(archive, local_tar_file)  # 파일 복사
    except Exception as e:
        print(f"❌ 파일 복사 실패: {archive} (에러: {str(e)})")
        remount_drive()  # Google Drive 재마운트
        continue

    # ✅ tar 내부 파일 개수 확인
    list_command = ["tar", "-tf", local_tar_file]
    result = subprocess.run(list_command, capture_output=True, text=True)

    if result.returncode != 0:
        print(f"  ❌ 오류 발생: {local_tar_file}")
        print(f"  ⚠️ 오류 내용: {result.stderr}")
        remount_drive()  # Google Drive 재마운트
        continue  # 다음 tar 파일로 넘어감

    file_list = result.stdout.split("\n")
    total_items = len(file_list)

    # ✅ Colab 로컬에서 압축 해제
    with tqdm(total=total_items, desc=f"Extracting {os.path.basename(local_tar_file)}", unit="file") as pbar:
        extract_command = ["tar", "--exclude=*.jpg", "-xf", local_tar_file, "-C", local_extract_dir]
        process = subprocess.Popen(extract_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

        for _ in process.stdout:
            pbar.update(1)
        process.wait()

    if process.returncode != 0:
        print(f"  ❌ 오류 발생: {local_tar_file}")
        print(f"  ⚠️ 오류 내용: {process.stderr.read()}")
        remount_drive()  # Google Drive 재마운트
        continue

    print(f"  ✅ {local_tar_file} 압축 해제 완료.")

    # ✅ Google Drive로 이동
    print(f"  🚀 Google Drive로 이동 중: {local_extract_dir} → {drive_destination_dir}")
    for item in os.listdir(local_extract_dir):
        src_path = os.path.join(local_extract_dir, item)
        dest_path = os.path.join(drive_destination_dir, item)
        shutil.move(src_path, dest_path)

    # ✅ 임시 파일 정리
    print(f"  🧹 Colab 로컬 임시 파일 정리 중...")
    os.remove(local_tar_file)  # 압축 해제 후 tar 파일 삭제
    shutil.rmtree(local_extract_dir)  # 압축 해제된 폴더 삭제 후 재생성
    os.makedirs(local_extract_dir, exist_ok=True)

print("\n✅ 모든 tar 파일의 압축 해제가 완료되었습니다.")

python

YawDD

를 사용하기로 했다.

모델 고민

팀원들과 모델 구조에 대해서 얘기를 나눴다.

YOLO을 통한 객체 검출

목적	운전자의 얼굴, 손, 휴대폰 등의 객체를 검출하여 각 프레임에서의 객체 위치 정보 획득
구현	사전 학습된 YOLOv4 또는 YOLOv5 모델을 사용하여 객체를 검출 각 객체의 바운딩 박스 좌표와 클래스 정보 획득

CNN(pretrained)을 통한 특징 추출

목적	YOLO로 검출된 각 객체의 이미지 패치를 입력으로 받아, 해당 패치의 특징 벡터를 추출
구현	사전 학습된 ResNet50 또는 MobileNetV2와 같은 경랴오하된 CNN 모델을 사용하여 각 이미지 패치에서 특징을 추출

💡

이미지 패치 - 원본 이미지에서 특정 부분을 잘라낸 작은 영역

☑️

이미지 패치 역할

객체 검출 후 관심 영역(ROI)만 CNN에 입력
- 바운딩 박스 내 영역만 잘라낸 이미지 = 이미지 패치
- 검출된 객체만 CNN에 입력하면 불필요한 배경 정보를 줄일 수 있음

객체 기반 특징 추출을 위한 활용
- “운전 중 핸드폰을 사용하고 있는지”를 판단할 때, 전체 이미지 대신 이미지 패치를 CNN에 입력하면, CNN이 더 정확하게 핸드폰 조작 여부를 학습 가능

LSTM을 통한 시간적 패턴 학습

목적	연속된 프레임에서 추출된 특징 벡터의 시퀀스를 입력으로 받아, 시간적인 패턴을 학습하여 운전자의 상태를 분류
구현	LSTM 또는 Bi-directional LSTM 레이어를 사용하여 시간적 의존성을 모델링하고, 최종적으로 소프트맥스 활성화 함수를 가진 출력 레이어를 통해 네 가지 클래스(전화 사용, 문자 작성, 졸음 운전, 정상 운전)로 분류

모델 고민(YOLO + CNN + LSTM)

모델 고민

Tags

sol’s blog

모델 고민(YOLO + CNN + LSTM)

모델 고민

Tags