INDEX
Explanations
statements expressing doubt or skepticism about claims and predictions
New Auto-Interp
Negative Logits
olon
-0.90
ioch
-0.88
orsi
-0.78
ankind
-0.78
forth
-0.74
married
-0.69
icycle
-0.69
iami
-0.68
TPS
-0.68
gradation
-0.67
POSITIVE LOGITS
unfounded
1.07
inaccurate
1.07
accurate
1.03
erroneous
1.03
incorrect
1.02
untrue
1.01
outlandish
0.95
miscon
0.91
false
0.90
exaggerated
0.90
Activations Density 0.323%