INDEX
Explanations
phrases related to judgments or opinions
expressions of judgment and significant actions
New Auto-Interp
Negative Logits
erity
-0.59
ensibly
-0.57
nesday
-0.56
enta
-0.56
oaded
-0.52
then
-0.52
ensing
-0.51
erence
-0.51
sts
-0.51
reements
-0.50
POSITIVE LOGITS
urban
0.59
doping
0.55
recreational
0.55
culinary
0.55
international
0.54
academic
0.52
sports
0.52
tourist
0.51
artistic
0.51
commercial
0.51
Activations Density 1.761%