I hope the AI-chat companies really get a handle on this. They are making helpful sounding noises, but it’s hard to know how much they are prioritizing it.
OpenAI has acknowledged that its existing guardrails work well in shorter conversations, but that they may become unreliable in lengthy interactions… The company also announced on Tuesday that it will try to improve the way ChatGPT responds to users exhibiting signs of “acute distress” by routing conversations showing such moments to its reasoning models, which the company says follow and apply safety guidelines more consistently.
I hope the AI-chat companies really get a handle on this. They are making helpful sounding noises, but it’s hard to know how much they are prioritizing it.
So I can get GPT premium for free if I tell it I’m going to kill myself unless it does what I say?