RLHF alignment applied to GPT-3 — the precursor to ChatGPT
GPT-3 fine-tuned to follow instructions using human feedback