INDEX
Explanations
sentences with the phrase "did not"
negations or statements about non-confirmation
New Auto-Interp
Negative Logits
Gems
-0.71
horizont
-0.66
proportions
-0.65
Gone
-0.65
Rebellion
-0.65
Heads
-0.64
ses
-0.63
mediocre
-0.63
Stories
-0.63
tons
-0.63
POSITIVE LOGITS
necessarily
1.05
icably
1.02
formally
1.01
officially
0.98
icable
0.98
disclose
0.92
divul
0.91
explicitly
0.88
specify
0.88
disclosed
0.87
Activations Density 0.186%