INDEX
Explanations
instances of the word "those" and similar demonstrative pronouns
New Auto-Interp
Negative Logits
ault
-0.21
å¯
-0.15
zell
-0.15
ancell
-0.14
tober
-0.14
ndon
-0.14
ags
-0.14
عات
-0.14
verse
-0.14
rix
-0.13
POSITIVE LOGITS
akin
0.15
curity
0.15
laughter
0.15
PyTuple
0.14
pra
0.14
fst
0.14
opsy
0.14
657
0.14
beiden
0.14
alara
0.14
Activations Density 0.128%