INDEX
Explanations
phrases related to claims, beliefs, and statements of certainty
expressions of belief or claims regarding factual statements
New Auto-Interp
Negative Logits
ratulations
-0.72
ntil
-0.66
perty
-0.64
Reply
-0.62
entanyl
-0.62
————
-0.59
dding
-0.59
endment
-0.59
essen
-0.58
untled
-0.57
POSITIVE LOGITS
constitutes
1.08
represents
1.01
belongs
1.01
deserves
1.00
qualifies
0.97
proves
0.93
could
0.91
resembles
0.91
amounted
0.91
contains
0.90
Activations Density 0.125%