INDEX
Explanations
phrases indicating the existence or continuation of established ideas or trends
New Auto-Interp
Negative Logits
onis
-0.21
ogie
-0.16
rahim
-0.16
vro
-0.15
cu
-0.15
lÃŃ
-0.15
ONO
-0.14
Gow
-0.14
beg
-0.14
.bs
-0.14
POSITIVE LOGITS
nothing
0.43
nothing
0.35
Nothing
0.34
NOTHING
0.32
Nothing
0.30
surprise
0.28
Surprise
0.28
nichts
0.26
novel
0.24
ниÑĩего
0.24
Activations Density 0.119%