INDEX
Explanations
the presence of the word "lo" and its variants in various contexts
New Auto-Interp
Negative Logits
chap
-0.15
rops
-0.14
asm
-0.14
collector
-0.14
ör
-0.14
elle
-0.14
sleeper
-0.14
.connector
-0.14
exped
-0.13
mid
-0.13
POSITIVE LOGITS
AZY
0.17
YO
0.14
Https
0.14
çĴĥ
0.14
ative
0.14
ylland
0.14
ustum
0.14
Caucus
0.14
oba
0.14
аÑĤив
0.14
Activations Density 0.005%