INDEX
Explanations
actions or outcomes that have a significant impact or consequences
action words that indicate declarations or statements of fact
New Auto-Interp
Negative Logits
sw
-0.70
away
-0.69
aneous
-0.68
nex
-0.64
pton
-0.61
squ
-0.59
tex
-0.59
xxx
-0.59
yon
-0.59
isp
-0.59
POSITIVE LOGITS
ometimes
1.05
ilver
0.94
ensibly
0.82
hift
0.81
omething
0.79
hirt
0.77
everal
0.72
olate
0.71
rals
0.69
uggest
0.68
Activations Density 0.467%