INDEX
Explanations
statements of fact or correctness
statements about correctness or truth
New Auto-Interp
Negative Logits
andise
-0.80
ogether
-0.77
VERTISEMENT
-0.76
icipated
-0.75
heastern
-0.73
actionGroup
-0.71
berus
-0.71
FTWARE
-0.70
ð
-0.70
letal
-0.68
POSITIVE LOGITS
referring
1.44
adamant
1.36
quoting
1.23
correct
1.22
proposing
1.20
arguing
1.19
pessimistic
1.17
mistaken
1.17
skeptical
1.16
aware
1.15
Activations Density 0.270%