INDEX
Explanations
instances of the word "there" to indicate locations or existence
New Auto-Interp
Negative Logits
sth
-0.18
venes
-0.17
ói
-0.16
usto
-0.15
ttp
-0.15
»
-0.15
abe
-0.15
chia
-0.14
enny
-0.14
amel
-0.14
POSITIVE LOGITS
dort
0.16
alone
0.15
ìĦľ
0.15
alone
0.15
they
0.14
after
0.14
ision
0.14
ison
0.13
ec
0.13
siz
0.13
Activations Density 0.052%