INDEX
Explanations
phrases indicating certainty or correctness
statements asserting correctness or verification of information
New Auto-Interp
Negative Logits
icipated
-0.76
venant
-0.73
ulton
-0.72
estine
-0.71
clerosis
-0.70
berus
-0.68
igslist
-0.67
letal
-0.66
TOD
-0.65
Installation
-0.64
POSITIVE LOGITS
referring
1.49
correct
1.44
wrong
1.41
quoting
1.32
mistaken
1.25
kidding
1.24
incorrect
1.22
arguing
1.21
talking
1.20
joking
1.19
Activations Density 0.240%