INDEX
Explanations
bot login messages and greetings
New Auto-Interp
Negative Logits
жном
0.67
冑
0.65
откуда
0.63
*
0.62
ൾ
0.62
Weiter
0.61
дов
0.59
жно
0.57
From
0.56
loadConst
0.56
POSITIVE LOGITS
gossip
1.00
chatted
0.94
immoral
0.88
ჩატი
0.87
chat
0.87
CHAT
0.86
detractors
0.85
opponents
0.85
questioned
0.84
contatto
0.84
Activations Density 0.006%