INDEX
Explanations
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
Fle
-0.16
ä¸Ī
-0.15
صد
-0.14
опиÑģ
-0.14
ãĥĨãĥ«
-0.14
akes
-0.14
ake
-0.13
Flush
-0.13
tel
-0.13
epar
-0.13
POSITIVE LOGITS
baugh
0.15
gnore
0.15
burg
0.15
indre
0.14
780
0.14
ÙĨÙ쨳Ùĩ
0.14
supra
0.13
ãģŁãģ¡ãģ®
0.13
ĶåĽŀ
0.13
OURS
0.13
Activations Density 0.040%