Current LLM alignment approaches rely heavily on preference learning and RLHF.
i think everyone in the ai safety community would like something like this - but how do you expect them to implement it?
i think everyone in the ai safety community would like something like this - but how do you expect them to implement it?