INDEX
Explanations
phrases expressing opinions or beliefs
phrases indicating speculation, belief, or opinion
New Auto-Interp
Negative Logits
ĸļ
-0.78
ciating
-0.67
Himself
-0.66
Lear
-0.62
sac
-0.61
ository
-0.60
washer
-0.59
demo
-0.59
ļéĨĴ
-0.58
Delicious
-0.58
POSITIVE LOGITS
sclerosis
0.67
errone
0.67
spoilers
0.66
incorrectly
0.66
blame
0.65
ael
0.63
underestimate
0.62
impeachment
0.60
benches
0.60
exaggeration
0.60
Activations Density 0.188%