intermediate_source/reinforcement_ppo.py 번역 by ptesogno · Pull Request #1139 · PyTorchKorea/tutorials-kr

ptesogno · 2026-05-17T12:35:04Z

라이선스 동의

변경해주시는 내용에 BSD 3항 라이선스가 적용됨을 동의해주셔야 합니다.

더 자세한 내용은 기여하기 문서를 참고해주세요.

동의하시면 아래 [ ]를 [x]로 만들어주세요.

기여하기 문서를 확인하였으며, 본 PR 내용에 BSD 3항 라이선스가 적용됨에 동의합니다.

PR 종류

이 PR에 해당되는 종류 앞의 [ ]을 [x]로 변경해주세요.

오탈자를 수정하거나 번역을 개선하는 기여
번역되지 않은 튜토리얼을 번역하는 기여
공식 튜토리얼 내용을 반영하는 기여
위 종류에 포함되지 않는 기여

PR 설명

이 PR로 무엇이 달라지는지 대략적으로 알려주세요.
intermediate_source/reinforcement_ppo.py 문서를 번역하였습니다.

ehdtjr · 2026-05-20T15:10:39Z

번역하시느라 고생 많으셨습니다!
아래는 제가 파악해본 수정사항인데, 참고해주시면 좋을듯합니다.

48 row : 근접성 제약(proximality constraint)를 -> 근접성 제약(proximality constraint)을
92 row : 필수적인정책 -> 필수적인 정책
343 row : Tahn-Normal -> Tanh-Normal
418, 470, 497, 549, 551, 666 row -> 훈련 -> 학습 / (용어 사용 규칙 참고)

testofschool · 2026-05-20T09:34:08Z

-# the better.
+# 데이터를 수집할 때 ``frames_per_batch`` 매개변수를 정의해 각 배치의 크기를
+# 결정할 수 있습니다. 사용할 수 있는 프레임 수(시뮬레이터와 상호작용하는 횟수 등) 또한 정의합니다.
+# 일반적으로 강화학습 알고리즘의 목표는 환경 상호작용동안 최대한 빠르게is to learn to solve the task


영문이 삭제되지 않고 남아 있습니다.
환경 상호작용동안 최대한 빠르게is to learn to solve the task → 환경과의 상호작용 측면에서 최대한 빠르게 문제를 해결하도록 학습하는 것입니다.

testofschool · 2026-05-20T09:34:28Z

-  - How to compute the advantage signal for policy gradient methods;
-  - How to create a stochastic policy using a probabilistic neural network;
-  - How to create a dynamic replay buffer and sample from it without repetition.
+  - 정책 변화도(policy gradient) 메서드에서 advatage 신호를 계산하는 방법


오타: advatage → advantage

testofschool · 2026-05-20T09:34:44Z

-# thing we need to care about is to build a neural network that outputs the
-# right number of parameters for the policy to work with (a location, or mean,
-# and a scale):
+# 데이터가 연속적(continuous)이므로, 우리는 액션 공간의 경계를 준수하기 위해 Tahn-Normal 분포를


오타: Tahn-Normal → Tanh-Normal

testofschool · 2026-05-20T09:35:43Z

-# 3. Next, we will design the policy network and the value model,
-#    which is indispensable to the loss function. These modules will be used
-#    to configure our loss module.
+# 3. 이후 손실 함수에 필수적인정책 네트워크와 가치 모델(value model)을 설계합니다.


띄어쓰기: 필수적인정책 → 필수적인 정책

testofschool · 2026-05-20T09:39:45Z

+#    이 모듈은 손실 모듈을 구성하는 데 사용될 것입니다.
 #
-# 4. Next, we will create the replay buffer and data loader.
+# 4. 다음으로 응답 버퍼와 데이터 로더를 생성합니다.


483번 줄인 '리플레이 버퍼(Replay buffer)'로 통일하셔서, 여기서도 "리플레이 버퍼"로 통일하면 좋을 것 같습니다!

수정 완료하였습니다. 감사합니다!

ptesogno · 2026-05-24T12:42:55Z

번역하시느라 고생 많으셨습니다! 아래는 제가 파악해본 수정사항인데, 참고해주시면 좋을듯합니다.

48 row : 근접성 제약(proximality constraint)를 -> 근접성 제약(proximality constraint)을 92 row : 필수적인정책 -> 필수적인 정책 343 row : Tahn-Normal -> Tanh-Normal 418, 470, 497, 549, 551, 666 row -> 훈련 -> 학습 / (용어 사용 규칙 참고)

수정 완료하였습니다. 감사합니다!

intermediate_source/reinforcement_ppo.py 번역

d3c2c25

testofschool reviewed May 20, 2026

View reviewed changes

리뷰 반영

0a6524c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intermediate_source/reinforcement_ppo.py 번역#1139

intermediate_source/reinforcement_ppo.py 번역#1139
ptesogno wants to merge 2 commits into
PyTorchKorea:masterfrom
ptesogno:main

ptesogno commented May 17, 2026

Uh oh!

ehdtjr commented May 20, 2026

Uh oh!

testofschool May 20, 2026

Uh oh!

testofschool May 20, 2026

Uh oh!

testofschool May 20, 2026

Uh oh!

testofschool May 20, 2026

Uh oh!

testofschool May 20, 2026

Uh oh!

ptesogno May 24, 2026

Uh oh!

ptesogno commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ptesogno commented May 17, 2026

라이선스 동의

관련 이슈 번호

PR 종류

PR 설명

Uh oh!

ehdtjr commented May 20, 2026

Uh oh!

testofschool May 20, 2026

Choose a reason for hiding this comment

Uh oh!

testofschool May 20, 2026

Choose a reason for hiding this comment

Uh oh!

testofschool May 20, 2026

Choose a reason for hiding this comment

Uh oh!

testofschool May 20, 2026

Choose a reason for hiding this comment

Uh oh!

testofschool May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ptesogno May 24, 2026

Choose a reason for hiding this comment

Uh oh!

ptesogno commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants