INDEX
Explanations
references to a specific individual or variations of their name
New Auto-Interp
Negative Logits
herit
-0.18
inois
-0.17
ury
-0.15
utor
-0.15
ADF
-0.15
rone
-0.14
ems
-0.14
ings
-0.14
oria
-0.14
ple
-0.14
POSITIVE LOGITS
adj
0.17
lected
0.15
assa
0.15
adir
0.15
аÑģÑģ
0.14
adj
0.14
ingleton
0.14
enek
0.14
åģ
0.14
buck
0.13
Activations Density 0.018%