INDEX
Explanations
comparative phrases and comparisons
comparisons about social justice or inequality
New Auto-Interp
Negative Logits
pole
-0.64
Summit
-0.64
trop
-0.62
agra
-0.61
HEAD
-0.60
atha
-0.60
PHOTOS
-0.59
estone
-0.59
ABE
-0.59
minist
-0.58
POSITIVE LOGITS
it
0.69
you
0.68
experien
0.65
eem
0.65
they
0.63
he
0.63
Malk
0.61
apan
0.60
him
0.60
hov
0.60
Activations Density 0.325%