INDEX
Explanations
specific phrases including the word "the"
repeated instances of the article "the."
New Auto-Interp
Negative Logits
/
-0.74
Mania
-0.68
Maher
-0.67
ornings
-0.64
zai
-0.64
SPONSORED
-0.63
ingly
-0.62
âĢij
-0.62
ESA
-0.61
days
-0.60
POSITIVE LOGITS
entire
1.43
remainder
1.24
entirety
1.23
whole
1.16
slightest
1.10
same
1.08
offending
1.06
ses
1.02
requisite
1.00
aforementioned
1.00
Activations Density 0.479%