INDEX
Explanations
words related to describing physical attributes or characteristics
concepts and phrases related to social roles and expectations
New Auto-Interp
Negative Logits
ãĥ¼ãĥ³
-0.54
ruce
-0.50
aughtered
-0.48
enger
-0.46
liga
-0.46
efer
-0.46
arij
-0.45
querade
-0.43
Wem
-0.42
Rib
-0.42
POSITIVE LOGITS
entimes
0.60
etheless
0.59
POS
0.51
behavi
0.51
terness
0.50
consider
0.50
sugg
0.50
especially
0.50
nihil
0.49
behaviors
0.48
Activations Density 1.526%