INDEX
Explanations
terminology related to prototyping and stereotypes
New Auto-Interp
Negative Logits
attendant
-0.69
attendants
-0.66
Cheong
-0.63
FO
-0.62
physician
-0.60
rush
-0.59
IRO
-0.59
Frey
-0.59
Rae
-0.58
GENERAL
-0.57
POSITIVE LOGITS
ical
1.38
otyp
1.18
ipl
1.16
ically
1.09
ing
1.08
ed
1.05
ĭ
1.02
ifact
1.00
itive
0.99
inary
0.98
Activations Density 0.003%