11.13, 11.20Instructor: Yaodong YangTopics CoveredHuman Preference CollectionPreference ModelingBradley-terry ModelReinforcement Learning from Human FeedbackDirect Preference Optimization