INDEX
Explanations
articles or statements urging specific actions or behaviors
commands or requests for action
New Auto-Interp
Negative Logits
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.80
çĦ
-0.76
nown
-0.68
marine
-0.65
puted
-0.63
miah
-0.63
amous
-0.62
esville
-0.62
Interstitial
-0.62
mberg
-0.61
POSITIVE LOGITS
caution
1.21
restraint
0.94
us
0.88
patience
0.85
forgiveness
0.85
policymakers
0.83
travellers
0.83
electors
0.82
travelers
0.81
followers
0.81
Activations Density 0.049%