운전자 이상 행동 중 문자 행동을 분류하기 위해 시선의 yaw(수평 각도), pitch(수직 각도)를 계산하는 모델인 L2CS-Net 을 이용하여 문자 를 할 때 시선이 자꾸 밑으로 가는 걸 탐지한다.

문제점

원래는 pytorch 에서 공식으로 지원하는 L2CS-Net 을 사용했는데, 값이 이해할 수 없게 나오는 문제가 있다.

방향과 부호가 이상함

프로젝트 데이터(DMD, yawDD)에서 추출한 yaw, pitch와 roboflow에 올라와있는 데모 영상 과 방향과 부호가 불일치

프로젝트 데이터(DMD, yawDD)에 적용한 yaw, pitch 부호

원래 이래야 함

yaw, pitch 값이 서로 바뀜

yaw, pitch 값이 뒤바뀐 것 같다고 생각될 때가 많았음

밑을 바라보고 있는데, 왜 pitch가 0도이고 yaw 가 0.41도 인지..

시선 각도가 너무 다른데 두번째 사진과 세번째 사진 모두 yaw가 비슷하다.

공식 레포의 issue들을 보니 코드 상에서 yaw, pitch가 바뀌었다는 issue도 있음

→ 실제로 코드 확인해보니 뒤바뀌어있음

각도의 크기가 너무 작음

yaw, pitch 값이 0~0.7 사이에 있음

L2CS-Net 논문에 따르면 L2CS-Net은 단위를 degree(°) 를 씀

classification 기반 모델 yaw, pitch를 90개의 클래스 중 하나로 분류
- raw output은 클래스 확률 분포 (Softmax 결과)

각 클래스는 일정한 각도 구간을 대표(→ 4도 단위로 90개 구간)
클래스 0 -180도
클래스 1 -176도
클래스 2 -172도
…
클래스 89 +176도

→최종적으로 softmax 확률 분포를 기반으로 기댓값을 구하고, 그걸 4를 곱하고 -180을 더해서 degree 단위의 연속값들을 만들어냄 → 최종 결과 단위는 degree(°)

💡

최종 결과를 계산하는 공식

Predicted Angle (degree) = (Softmax Expected Index) * 4 - 180

Softmax Expected Index: Softmax 확률분포 전체를 반영해서 부드럽게 평균 내는 것. 가장 높은 확률(=argmax)만 쓰면 뚝뚝 끊기기 때문

L2CS Net 레포 코드(test.py)를 확인해보니 최종 결과를 계산하는 방식이 이상하게 되어있음

test.py code

# Continuous predictions
pitch_predicted = softmax(gaze_pitch)
yaw_predicted = softmax(gaze_yaw)

# mapping from binned (0 to 28) to angels (-42 to 42)                
pitch_predicted = \
    torch.sum(pitch_predicted * idx_tensor, 1).cpu() * 3 - 42
yaw_predicted = \
    torch.sum(yaw_predicted * idx_tensor, 1).cpu() * 3 - 42


pitch_predicted = pitch_predicted*np.pi/180
yaw_predicted = yaw_predicted*np.pi/180

python

프로젝트에 사용한 pipeline.py에는 제대로 계산을 하고 있지만, 반환하는 값의 단위를 raidan으로 변환하여 반환하고 있음을 확인 → 단위가 radian이어서 각도가 작았던 것.

pipeline.py code

def predict_gaze(self, frame: Union[np.ndarray, torch.Tensor]):
    
    # Prepare input
    if isinstance(frame, np.ndarray):
        img = prep_input_numpy(frame, self.device)
    elif isinstance(frame, torch.Tensor):
        img = frame
    else:
        raise RuntimeError("Invalid dtype for input")

    # Predict 
    gaze_pitch, gaze_yaw = self.model(img)
    pitch_predicted = self.softmax(gaze_pitch)
    yaw_predicted = self.softmax(gaze_yaw)
    
    # Get continuous predictions in degrees.
    pitch_predicted = torch.sum(pitch_predicted.data * self.idx_tensor, dim=1) * 4 - 180
    yaw_predicted = torch.sum(yaw_predicted.data * self.idx_tensor, dim=1) * 4 - 180
    
    pitch_predicted= pitch_predicted.cpu().detach().numpy()* np.pi/180.0
    yaw_predicted= yaw_predicted.cpu().detach().numpy()* np.pi/180.0

    return pitch_predicted, yaw_predicted

python

해결 방안

다른 모델 사용

더 개선된 L2CS 레포를 찾음

-180 ~ 180도로 반환하는 코드 확인 ✅

이전보다 성능이 개선된 모델 ✅

*는 이전에 사용한 레포 모델 성능

Ahmednull/L2CS-Net vs Shohruh72/L2CS-Net

새로 찾은 모델을 적용시켜서 비교를 해보면..

Ahmednull/L2CS-Net

Yaw: -1.23 radian
Pitch: -0.15 radian

Shohruh72/L2CS-Net

Yaw: -5.14 degrees
Pitch: -37.45 degrees

Shohruh72/L2CS-Net 참고하여 작성한 코드

import torch
import cv2
import numpy as np
from torchvision import transforms
from PIL import Image
from retinaface import RetinaFace
import matplotlib.pyplot as plt
from utils import util


def infer_and_draw_gaze(img_path):
    """모델 로딩부터 얼굴 탐지, gaze 예측, bbox+화살표 시각화까지 한 번에 수행하는 함수"""

    num_bins = 90
    model_path = './data/model/L2CSNET/best.pt'

    # 모델 로드
    model = torch.load(model_path, map_location='cpu')
    model = model['model'].float()
    model.eval()
    
    faces = RetinaFace.detect_faces(img_path)
    softmax = torch.nn.Softmax(dim=1)
    
    idx_tensor = torch.arange(num_bins, dtype=torch.float32)
    
    transform = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    # 이미지 로드
    frame = cv2.imread(img_path)
    if frame is None:
        raise FileNotFoundError(f"Image not found: {img_path}")

    if len(faces) == 0:
        print("No face detected!")
        return None, None
    
    print(faces)

    for face in faces.values():
        box = face['facial_area']
        score = face['score']
        if score < 0.95:
            continue

        x_min, y_min = int(box[0]), int(box[1])
        x_max, y_max = int(box[2]), int(box[3])
        bbox_width = abs(x_max - x_min)
        bbox_height = abs(y_max - y_min)

        # 얼굴 박스 확장
        # x_min = max(0, x_min - int(0.2 * bbox_height))
        # y_min = max(0, y_min - int(0.2 * bbox_width))
        # x_max = min(frame.shape[1], x_max + int(0.2 * bbox_height))
        # y_max = min(frame.shape[0], y_max + int(0.2 * bbox_width))
        
        bbox_width = abs(x_max - x_min)
        bbox_height = abs(y_max - y_min)

        # 얼굴 crop 및 전처리
        face_img = frame[y_min:y_max, x_min:x_max, :]
        face_img = Image.fromarray(face_img).convert('RGB')
        img_tensor = transform(face_img).unsqueeze(0)

        yaw, pitch = model(img_tensor)
        yaw_predicted = softmax(yaw)
        pitch_predicted = softmax(pitch)

        yaw_angle = torch.sum(yaw_predicted.data * idx_tensor, dim=1) * 4 - 180
        pitch_angle = torch.sum(pitch_predicted.data * idx_tensor, dim=1) * 4 - 180
        
        pitch = pitch_angle.cpu().detach().numpy() * np.pi / 180.0
        yaw = yaw_angle.cpu().detach().numpy() * np.pi / 180.0

        print(f"Yaw: {yaw_angle.item():.2f} degrees")
        print(f"Pitch: {pitch_angle.item():.2f} degrees")
        
        image_out = frame.copy()
        
        # bounding box center
        pos = (int(x_min + bbox_width / 2.0), int(y_min + bbox_height / 2.0))
        
        # 그레이스케일이면 컬러로 변환
        if len(image_out.shape) == 2 or image_out.shape[2] == 1:
            image_out = cv2.cvtColor(image_out, cv2.COLOR_GRAY2BGR)

        # gaze 방향 계산
        length = bbox_width  # 박스 width 기준 화살표 길이
        pitch, yaw = pitch.item(), yaw.item()
        dx = -length * np.sin(yaw) * np.cos(pitch)
        dy = -length * np.sin(pitch)

        # 화살표 그리기
        start_point = tuple(np.round(pos).astype(np.int32))
        end_point = tuple(np.round([pos[0] + dx, pos[1] + dy]).astype(int))
        cv2.arrowedLine(image_out, start_point, end_point, (0, 0, 255), 2, cv2.LINE_AA, tipLength=0.18)

        # bounding box 그리기
        top_left = (int(x_min), int(y_max))
        bottom_right = (int(x_min + bbox_width), int(y_min))
        cv2.rectangle(image_out, top_left, bottom_right, (0, 255, 0), 2)
        
        for key, value in face["landmarks"].items():
            color, thickness, radius = (0, 255, 0), 1, 1
            x, y = int(value[0]), int(value[1])
            cv2.circle(image_out, (x, y), thickness, color, radius)

    image_rgb = cv2.cvtColor(image_out, cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(4, 4))
    plt.imshow(image_rgb)
    plt.axis('off')
    plt.show()
    
    return yaw_angle.item(), pitch_angle.item()

python

Ahmednull/L2CS-Net 이 더 정확한 것 같음

Ahmednull/L2CS-Net 모델 반환 값 수정하여 사용

모델이 반환하는 yaw, pitch 값을 원상태로 복구하여 사용

해석하거나 시각화할 때는 radian 를 degree로 변환하여 사용

🔎 radian vs degree 둘 중 모델 학습에 뭐가 더 적합할까?

항목	radian	degree
사용 예시	수학적 연산 및 신경망 구현에서는 radian이 표준	사람에게 익숙 모델 내부 연산에는 부적합
값의 범위	-pi ~ pi (약 -3.14 ~ 3.14)	-180 ~ +180
초기 학습 안정성	더 빠르고 부드럽게 수렴	초기 loss가 커서 진동할 수 있음
미세한 시선 변화 감지	작은 값 변화에 민감 작은 변화도 loss에 작게 반영되어 미세한 조정 가능	상대적으로 둔감 작은 변화도 loss가 커서 덜 미세하게 반영

→ 해석하거나 시각화할 때는 degree를 사용하고, 모델에는 radian 을 넣으려고 함.

Ahmednull/L2CS-Net 모델 오류

문제점

방향과 부호가 이상함

yaw, pitch 값이 서로 바뀜

각도의 크기가 너무 작음

해결 방안

다른 모델 사용

Ahmednull/L2CS-Net vs Shohruh72/L2CS-Net

Ahmednull/L2CS-Net 모델 반환 값 수정하여 사용

🔎 radian vs degree 둘 중 모델 학습에 뭐가 더 적합할까?

Tags

클래스 0	-180도
클래스 1	-176도
클래스 2	-172도
…
클래스 89	+176도

sol’s blog

Ahmednull/L2CS-Net 모델 오류

문제점

방향과 부호가 이상함

yaw, pitch 값이 서로 바뀜

각도의 크기가 너무 작음

해결 방안

다른 모델 사용

Ahmednull/L2CS-Net vs Shohruh72/L2CS-Net

Ahmednull/L2CS-Net 모델 반환 값 수정하여 사용

🔎 radian vs degree 둘 중 모델 학습에 뭐가 더 적합할까?

Tags