INDEX
Explanations
references to specific locations or institutions
New Auto-Interp
Negative Logits
orm
-0.17
ay
-0.15
ad
-0.15
Robin
-0.15
ar
-0.15
ack
-0.15
بÙĪØ§Ø¨Ø©
-0.15
err
-0.15
ruh
-0.15
aron
-0.14
POSITIVE LOGITS
Petersburg
0.35
Clair
0.30
Augustine
0.29
Mary
0.28
Johns
0.28
Cloud
0.27
Francis
0.27
Andrews
0.27
Jude
0.27
Louis
0.27
Activations Density 0.018%