INDEX
Explanations
expressions of dissatisfaction or critique regarding situations or actions
New Auto-Interp
Negative Logits
éĻ£
-0.16
Johnny
-0.16
overn
-0.16
ãĥ¬ãĥĵ
-0.16
idon
-0.15
ä¸ĢåĮº
-0.15
ream
-0.15
егоÑĢ
-0.14
éĺµ
-0.14
Johnny
-0.14
POSITIVE LOGITS
itia
0.18
olla
0.17
ardu
0.16
ully
0.15
428
0.15
vrier
0.15
HEME
0.15
Tig
0.14
äº
0.14
κι
0.14
Activations Density 0.003%