INDEX
Explanations
terms that express emotional distress and suffering
New Auto-Interp
Negative Logits
habit
-0.16
erg
-0.15
foy
-0.15
azen
-0.15
apos
-0.14
anie
-0.14
steam
-0.14
erten
-0.14
ãĤįãģĨ
-0.14
sworth
-0.14
POSITIVE LOGITS
ilk
0.16
QByteArray
0.16
412
0.15
ormsg
0.15
ORIES
0.15
ingly
0.15
ноп
0.15
706
0.14
建
0.14
Dough
0.14
Activations Density 0.005%