INDEX
Explanations
the word "are" occurring in sentences
repeated phrases of existence or identity
New Auto-Interp
Negative Logits
osate
-0.73
ossom
-0.70
udeau
-0.69
iates
-0.66
rouse
-0.64
rarily
-0.64
pedia
-0.64
urry
-0.63
leck
-0.63
mater
-0.62
POSITIVE LOGITS
nt
1.04
hereby
1.00
supposed
0.99
senal
0.96
not
0.94
definitely
0.89
gonna
0.89
wolf
0.85
also
0.85
going
0.85
Activations Density 0.316%