INDEX
Explanations
phrases indicating physical movements or actions involving objects
elements related to affection and attachment
New Auto-Interp
Negative Logits
monarchy
-0.55
Accountability
-0.53
leground
-0.52
comprehens
-0.52
criminal
-0.52
pects
-0.52
Ratings
-0.51
Teachers
-0.51
rosso
-0.51
Colleges
-0.50
POSITIVE LOGITS
gently
0.66
slur
0.59
refill
0.56
driveway
0.54
prest
0.53
photoc
0.53
chewing
0.53
aeros
0.53
mim
0.53
convertible
0.53
Activations Density 1.912%