INDEX
Explanations
statements that confirm suspicions or established facts
references to data supporting common beliefs or suspicions
New Auto-Interp
Negative Logits
disposal
-0.84
selection
-0.73
Doodle
-0.72
diversion
-0.68
selection
-0.67
compares
-0.67
itiz
-0.66
Volunteers
-0.65
gins
-0.65
PHOTO
-0.63
POSITIVE LOGITS
anecd
0.91
debunked
0.91
already
0.90
loudly
0.88
previously
0.85
repeatedly
0.85
false
0.83
painfully
0.83
plainly
0.81
theorem
0.80
Activations Density 0.206%