INDEX
Explanations
adjectives describing opinion or evaluation
assertions and tentative claims in statements
New Auto-Interp
Negative Logits
doors
-0.66
PLAN
-0.61
DO
-0.57
unfocusedRange
-0.57
SN
-0.57
Everywhere
-0.56
deadlines
-0.55
manual
-0.54
freezes
-0.53
XY
-0.53
POSITIVE LOGITS
understatement
0.83
compare
0.82
conjecture
0.79
rued
0.79
recall
0.76
comparison
0.75
exaggeration
0.75
athom
0.75
haps
0.75
Compare
0.74
Activations Density 0.831%