INDEX
Explanations
proper nouns and important figures in various contexts
New Auto-Interp
Negative Logits
latter
-0.17
zione
-0.16
ãģĦãĤĭ
-0.14
writing
-0.14
åħĴ
-0.14
ت
-0.14
ìĦ±ìĿ´
-0.14
abouts
-0.13
ëĤĺ
-0.13
listed
-0.13
POSITIVE LOGITS
dÄĽ
0.17
sworth
0.15
rophy
0.14
ëŁ¼
0.14
/dr
0.14
ìĦľ
0.14
itories
0.13
ktop
0.13
pearance
0.13
pillar
0.13
Activations Density 1.977%