INDEX
Explanations
proper nouns and names of individuals in news articles
New Auto-Interp
Negative Logits
ulhu
-0.44
ICLE
-0.43
»Ĵ
-0.42
HAEL
-0.42
lished
-0.41
lehem
-0.41
escription
-0.41
ģĸ
-0.41
corrid
-0.41
ASED
-0.40
POSITIVE LOGITS
horn
0.46
Dragonbound
0.46
zai
0.42
hirt
0.42
iland
0.41
oxide
0.41
velt
0.40
acht
0.40
Revenge
0.40
ppa
0.40
Activations Density 11.391%