INDEX
Explanations
instances of the pronoun "it" in various contexts
New Auto-Interp
Negative Logits
anton
-0.67
persuasion
-0.63
dding
-0.59
entry
-0.59
knowledge
-0.59
hips
-0.58
agogue
-0.57
Gloria
-0.57
igmatic
-0.57
å½
-0.57
POSITIVE LOGITS
'll
1.11
unes
1.03
chy
1.03
seems
1.00
alian
0.99
's
0.99
integrates
0.94
contains
0.94
anium
0.93
self
0.92
Activations Density 0.201%