INDEX
Explanations
references to individuals and groups
New Auto-Interp
Negative Logits
raiſ
-1.08
Dreyfus
-1.05
uſed
-1.02
cauſe
-1.01
purpoſe
-1.01
poffe
-0.98
ſtate
-0.96
Weyl
-0.95
Athenian
-0.94
houſe
-0.94
POSITIVE LOGITS
them
0.88
HIM
0.87
him
0.85
THEM
0.84
Them
0.84
м
0.83
Them
0.81
Him
0.80
Him
0.80
m
0.76
Activations Density 0.073%