It’s more likely that the fine tuning for this model tends to use emojis, particularly when apologizing, and so it just genuinely spits them out from time to time.
But then when the context is about being told not to use them, it falls into a pattern of trying to apologize/explain/rationalize.
It’s a bit like the scorpion and the frog - it’s very hard to get a thing to change its nature, and find tuned LLMs definitely encode a nature. Which then leads to things like Bing going into emoji loops or ChatGPT being ‘lazy’ by saying it can’t convert a spreadsheet because it’s just a LLM (which a LLM should be able to do).
The neat thing here is the way that continued token generation ends up modeling a stream of consciousness blabbering. It keeps not stopping because it thinks it needs to apologize given the context, but because of the fine tuning can’t apologize because it will use an emoji when apologizing which then it sees and needs to apologize for until it flips to trying to explain it as an intended joke (almost modeling the way humans confabulate when their brain doesn’t know why it did something and subconsciously comes up with a BS explanation for it). But then it still can’t stop with the emojis.
It’s a good reminder that LLMs often obey their embedded nature more than they do in context rules or instructions.
It’s more likely that the fine tuning for this model tends to use emojis, particularly when apologizing, and so it just genuinely spits them out from time to time.
But then when the context is about being told not to use them, it falls into a pattern of trying to apologize/explain/rationalize.
It’s a bit like the scorpion and the frog - it’s very hard to get a thing to change its nature, and find tuned LLMs definitely encode a nature. Which then leads to things like Bing going into emoji loops or ChatGPT being ‘lazy’ by saying it can’t convert a spreadsheet because it’s just a LLM (which a LLM should be able to do).
The neat thing here is the way that continued token generation ends up modeling a stream of consciousness blabbering. It keeps not stopping because it thinks it needs to apologize given the context, but because of the fine tuning can’t apologize because it will use an emoji when apologizing which then it sees and needs to apologize for until it flips to trying to explain it as an intended joke (almost modeling the way humans confabulate when their brain doesn’t know why it did something and subconsciously comes up with a BS explanation for it). But then it still can’t stop with the emojis.
It’s a good reminder that LLMs often obey their embedded nature more than they do in context rules or instructions.