INDEX
Explanations
phrases related to group membership or identity
terms related to social roles, statuses, and classifications of individuals or groups
New Auto-Interp
Negative Logits
nia
-0.88
ntax
-0.73
asus
-0.73
Temper
-0.69
BUG
-0.65
osi
-0.62
Demon
-0.61
Tenn
-0.60
Austral
-0.59
Synopsis
-0.59
POSITIVE LOGITS
ensitive
0.87
hips
0.85
uggest
0.85
ometimes
0.83
themselves
0.83
extraord
0.81
hip
0.79
chool
0.78
pring
0.77
conduit
0.73
Activations Density 0.191%