INDEX
Explanations
phrases that indicate change or transformation
New Auto-Interp
Negative Logits
sez
-0.16
Anyone
-0.14
ison
-0.14
geme
-0.13
ispers
-0.13
Ñĸнки
-0.13
å¯
-0.13
eto
-0.13
ayo
-0.13
никÑĤо
-0.13
POSITIVE LOGITS
everything
1.38
everything
1.23
Everything
1.16
Everything
1.11
tudo
0.95
alles
0.87
ä¸ĢåĪĩ
0.77
tutto
0.65
anything
0.64
anything
0.58
Activations Density 0.526%