INDEX
Explanations
words related to emotions, societal attitudes, and power dynamics
themes related to emotions, power dynamics, and societal issues
New Auto-Interp
Negative Logits
arnaev
-0.82
crisp
-0.68
guyen
-0.67
zl
-0.67
olulu
-0.64
nces
-0.63
pload
-0.63
alez
-0.61
iannopoulos
-0.59
illion
-0.59
POSITIVE LOGITS
lessness
0.94
smanship
0.78
iveness
0.76
anasia
0.76
thood
0.76
fulness
0.73
ism
0.70
manship
0.69
iness
0.68
liness
0.66
Activations Density 0.378%