INDEX
Explanations
specific pronouns preceding descriptive words
instances of the word "the."
New Auto-Interp
Negative Logits
ward
-0.75
Supported
-0.74
lihood
-0.73
ESA
-0.72
thereby
-0.72
kell
-0.70
linked
-0.70
nesty
-0.69
amid
-0.69
according
-0.69
POSITIVE LOGITS
coolest
1.27
slightest
1.25
guy
1.15
whole
1.15
damn
1.13
goddamn
1.12
same
1.08
fuckin
1.07
fucking
1.05
crap
1.04
Activations Density 0.846%