INDEX
Explanations
references to individuals and groups
New Auto-Interp
Negative Logits
Racine
-0.91
cauſe
-0.85
Dreyfus
-0.82
raiſ
-0.78
MTA
-0.77
houſe
-0.77
pædia
-0.77
oídos
-0.76
].)
-0.75
ſta
-0.75
POSITIVE LOGITS
them
1.51
THEM
1.34
Them
1.33
Them
1.32
him
1.30
them
1.23
HIM
1.19
Him
1.18
Him
1.16
HIM
1.06
Activations Density 0.088%