INDEX
Explanations
comparisons between social concepts and actions
New Auto-Interp
Negative Logits
large
-0.76
ilion
-0.73
chwitz
-0.71
ourced
-0.71
ugar
-0.69
Cover
-0.68
fml
-0.68
Pradesh
-0.68
edIn
-0.68
ourcing
-0.68
POSITIVE LOGITS
osphere
1.03
extraord
1.02
liest
0.95
archetype
0.94
who
0.93
hood
0.92
whom
0.84
iest
0.80
who
0.79
himself
0.79
Activations Density 0.306%