INDEX
Explanations
formal language negations
instances of the word "didn't" or its variations in different contexts
New Auto-Interp
Negative Logits
attain
-0.62
alg
-0.62
ipes
-0.60
cot
-0.53
Article
-0.52
approximately
-0.52
PUBLIC
-0.52
reciprocal
-0.51
ens
-0.51
instances
-0.50
POSITIVE LOGITS
didn
2.94
hadn
2.53
didn
2.51
wasn
2.38
weren
2.32
couldn
2.29
didnt
2.29
Didn
2.21
wouldn
2.10
hasn
1.98
Activations Density 0.033%