INDEX
Explanations
the pronoun "it" and its various usages in different contexts
New Auto-Interp
Negative Logits
assian
-0.68
Republic
-0.68
anton
-0.66
ullivan
-0.66
hips
-0.65
untled
-0.63
priv
-0.62
arthed
-0.62
noticed
-0.62
Corpus
-0.61
POSITIVE LOGITS
ain
1.25
happens
1.12
happened
1.09
hurts
1.09
involves
1.07
lasts
1.01
chy
1.01
sucks
1.00
exists
1.00
seems
0.99
Activations Density 0.157%