INDEX
Explanations
instances of negative statements
instances of the word "not"
New Auto-Interp
Negative Logits
CRIP
-0.67
æĥ
-0.67
++++
-0.65
éĥ
-0.64
GEN
-0.61
Pierre
-0.60
rift
-0.60
vice
-0.60
Pr
-0.60
Spectrum
-0.59
POSITIVE LOGITS
yet
1.09
been
1.08
epad
1.03
icably
1.01
icable
0.99
yet
0.96
necessarily
0.93
hin
0.93
gotten
0.91
been
0.89
Activations Density 0.051%