INDEX
Explanations
pronouns and verbs related to actions
instances of the pronoun "it" and similar terms indicating subjects or topics in context
New Auto-Interp
Negative Logits
Gender
-0.77
orse
-0.69
hart
-0.68
mma
-0.66
pat
-0.65
priv
-0.64
tnc
-0.64
Aid
-0.64
aca
-0.63
grass
-0.62
POSITIVE LOGITS
nonetheless
1.38
nevertheless
1.34
persisted
0.99
fortunately
0.97
still
0.89
theless
0.89
certainly
0.89
'll
0.89
couldn
0.87
cannot
0.87
Activations Density 0.359%