INDEX
Explanations
phrases related to expressing opinions or beliefs
statements of assertion or belief
New Auto-Interp
Negative Logits
iates
-0.78
slips
-0.66
carts
-0.66
pled
-0.63
notices
-0.63
hens
-0.63
exhibits
-0.62
Supports
-0.62
pains
-0.60
Talks
-0.60
POSITIVE LOGITS
cussion
0.77
not
0.76
indeed
0.75
moot
0.74
unclear
0.74
undoubtedly
0.72
definitely
0.72
straightforward
0.71
therefore
0.70
olate
0.69
Activations Density 0.264%