INDEX
Explanations
negations and words indicating uncertainty or disagreement
Followed by "not"
not followed by context
New Auto-Interp
Negative Logits
ſeveral
-0.94
myſelf
-0.94
Baillargeon
-0.89
Efq
-0.88
Houſe
-0.81
Jefus
-0.80
Administrativna
-0.80
purpoſe
-0.79
bershka
-0.78
nahilalakip
-0.78
POSITIVE LOGITS
easy
0.58
even
0.57
pri
0.55
vium
0.54
true
0.52
pre
0.51
fair
0.50
c
0.50
Even
0.50
ो
0.49
Activations Density 0.220%