INDEX
Explanations
occurrences of the word "the."
New Auto-Interp
Negative Logits
oris
-0.16
on
-0.16
anja
-0.15
AWN
-0.15
anje
-0.14
onne
-0.14
onen
-0.14
elligence
-0.14
oner
-0.13
¸
-0.13
POSITIVE LOGITS
guise
0.17
umbrella
0.15
ady
0.15
wing
0.15
unf
0.15
wings
0.14
_ttl
0.14
uard
0.14
ÏĦαν
0.14
whel
0.14
Activations Density 0.023%