INDEX
Explanations
phrases related to institutional issues or settings
terms related to institutional structures and their implications
New Auto-Interp
Negative Logits
person
-0.84
woman
-0.82
glass
-0.81
women
-0.80
model
-0.79
models
-0.79
lihood
-0.77
gl
-0.71
gur
-0.70
vich
-0.70
POSITIVE LOGITS
ized
0.92
iary
0.89
inertia
0.85
ised
0.83
iation
0.83
aneously
0.82
iated
0.79
divid
0.74
aneous
0.74
izes
0.72
Activations Density 0.170%