INDEX
Explanations
verbs indicating action, particularly those related to making decisions or conclusions
common auxiliary verbs and forms of the verb "to do."
New Auto-Interp
Negative Logits
welf
-0.78
destro
-0.72
warr
-0.71
peat
-0.70
rul
-0.69
advoc
-0.67
corrid
-0.66
ãĥ¼ãĥĨãĤ£
-0.65
nodd
-0.65
dilig
-0.63
POSITIVE LOGITS
.
0.94
—
0.81
!
0.81
.[
0.78
;
0.78
.<
0.76
,
0.76
ãĢĤ
0.75
:
0.75
:)
0.74
Activations Density 0.576%