[논문 리뷰] Computer Vision - Retina(Focal Loss for Dense Object Detection)

728x90

SMALL

📃 논문 https://arxiv.org/pdf/1708.02002.pdf

Object detection 모델은 이미지 내의 객체의 영역을 추정하고 IoU threshold에 따라 positive/negative sample로 구분한 후, 이를 활용하여 학습
일반적으로 이미지 내 객체의 수가 적기 때문에 positive sample(객체 영역)은 negative sample(배경 영역)에 비해 매우 적음 → positive/negative sample 사이에 큰 차이가 생겨 class imbalance 문제가 발생
class imbalance
- easy negative(클래스를 예측하기 쉬운 샘플)의 수가 압도적으로 많기 때문에 학습에 끼치는 영향력이 커져 모델의 성능이 하락
- 모델이 class를 예측하기 쉬운 sample이기 때문에 유용한 기여를 하지 못해 학습이 비효율적으로 진행
- Two-stage detector 계열의 모델 class imbalance 해결방법
  1. region proposals를 추려내는 방법을 적용하여 대부분의 background sample을 걸러주는 방법을 사용
    - elective search
    - edgeboxes
    - deepmask
    - RPN
  2. positive/negative sample의 수를 적절하게 유지하는 sampling heuristic 방법을 적용
    - hard negative mining
    - OHEM
- one-stage detector는 region proposal 과정이 없어 전체 이미지를 빽빽하게 순회하면서 sampling하는 dense sampling 방법을 수행하기 때문에 two-stage detector에 비해 훨씬 더 많은 후보 영역을 생성(class imbalance 문제가 two-stage detector보다 더 심각)

💡 본 논문에서는 학습 시 training imbalance가 주된 문제로 보고, 이러한 문제를 해결하여 one-stage detector에서 적용할 수 있는 새로운 loss function을 제시

Preview

Focal loss

cross entropy loss에 class에 따라 변하는 동적인 scaling factor를 추가한 형태
학습 시 easy example의 기여도를 자동적으로 down-weight하며, hard example에 대해서 가중치를 높혀 학습을 집중시킬 수 있음
Focal loss의 효과를 실험하기 위해 논문에서는 one-stage detector인 RetinaNet을 설계

Main Ideas

1) Focal loss vs Balanced Cross Entropy

CE loss
- 이진 분류에서 사용
- 모든 sample에 대한 예측 결과를 동등하게 가중치를 둔다는 점
- 어떠한 sample이 쉽게 분류될 수 있음에도 불구하고 작지 않은 loss를 유발
Balanced Cross Entropy
- CE loss에 가중치 파라미터를 곱함
Focal loss
- one-stage detector 모델에서 foreground와 background class 사이에 발생하는 극단적인 class imbalance(가령 1:1000)문제를 해결하는데 사용
- easy example을 down-weight하여 hard negative sample에 집중하여 학습하는 loss function

2) RetinaNet

one-stage detector
하나의 backbone network와 각각 classification과 bounding box regression을 수행하는 2개의 subnetwork로 구성

Training RetinaNet

1) Feature Pyramid by ResNet + FPN

이미지를 backbone network에 입력하여 서로 다른 5개의 scale을 가진 feature pyramid를 출력
backbone network는 **ResNet 기반의 FPN(Feature Pyramid Network)**를 사용
pyramid level은 P3~P7로 설정

Input : image
Process : feature extraction by ResNet + FPN
Output : feature pyramid(P5~P7)

2) Classification by Classification subnetwork

1)번 과정에서 얻은 각 pyramid level별 feature map을 Classification subnetwork에 입력
해당 subnet는 3x3(xC) conv layer - ReLU - 3x3(xKxA) conv layer로 구성
- K = class의 수
- A = anchor box의 수(논문에서는 A=9로 설정)
마지막으로 얻은 feature map의 각 spatial location(feature map의 cell)마다 sigmoid activation function을 적용
channel 수가 KxA인 5개(feature pyramid의 수)의 feature map을 얻음

Input : feature pyramid(P5~P7)
Process : classification by classification subnetwork
Output : 5 feature maps with KxA channel

3) Bounding box regression by Bounding box regression subnetwork

1)번 과정에서 얻은 각 pyramid level별 feature map을 Bounding box regression subnetwork에 입력
feature map이 anchor box별로 4개의 좌표값(x, y, w, h)을 encode하도록 channel 수를 조정
최종적으로 channel 수가 4xA인 5개의 feature map을 얻음

Input : feature pyramid(P5~P7)
Process : bounding box regression by bounding box regression subnet
Output : 5 feature maps with 4xA channel

Inference

RetinaNet을 COCO 데이터셋을 통해 학습시킨 후 서로 다른 loss function을 사용하여 AP 값을 측정
결과 CE loss는 30.2%, Balanced Cross Entropy는 31.1%, Focal loss는 34% AP
SSD 모델을 통해 positive/negative 비율을 1:3으로, NMS threshold=0.5로 설정한 OHEM과 성능을 비교한 결과, Focal loss를 사용한 경우의 AP값이 3.2% 더 높게 나타남

Focal loss가 class imbalance 문제를 기존의 방식보다 효과적으로 해결

728x90

LIST

'AI Paper Review' 카테고리의 다른 글

[논문 리뷰] Computer Vision - SOLO(Semgmentation Objects by LOcation) (0)	2023.10.04
[논문 리뷰] Computer Vision - DETR (0)	2023.09.10
[논문 리뷰] Computer Vision - UNet (0)	2023.09.08
[논문 리뷰] Computer Vision - Mask R-CNN (0)	2023.09.08
[논문 리뷰] Computer Vision - YOLO(You Only Look Once:Unified, Real-Time Object Detection) (0)	2023.09.07

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

코딩하는 춘식이

[논문 리뷰] Computer Vision - Retina(Focal Loss for Dense Object Detection)

Preview

Focal loss

Main Ideas

1) Focal loss vs Balanced Cross Entropy

2) RetinaNet

Training RetinaNet

1) Feature Pyramid by ResNet + FPN

2) Classification by Classification subnetwork

3) Bounding box regression by Bounding box regression subnetwork

Inference

'AI Paper Review' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[논문 리뷰] Computer Vision - Retina(Focal Loss for Dense Object Detection)

Preview

Focal loss

Main Ideas

1) Focal loss vs Balanced Cross Entropy

2) RetinaNet

Training RetinaNet

1) Feature Pyramid by ResNet + FPN

2) Classification by Classification subnetwork

3) Bounding box regression by Bounding box regression subnetwork

Inference

'AI Paper Review' 카테고리의 다른 글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역