INDEX
Explanations
repeated uses of the verb "are" in various contexts
New Auto-Interp
Negative Logits
itself
-0.28
rael
-0.18
server
-0.16
lem
-0.15
gnore
-0.15
omers
-0.15
ner
-0.14
ual
-0.14
enstein
-0.14
imize
-0.14
POSITIVE LOGITS
themselves
0.34
thems
0.17
meisten
0.17
ÑĨа
0.17
yourselves
0.16
hell
0.15
hip
0.15
Ñģами
0.15
iyan
0.15
psilon
0.14
Activations Density 0.519%