INDEX
Explanations
names of people and places
names of people, places, or brands, particularly in the context of specific narratives or reports
New Auto-Interp
Negative Logits
distingu
-0.99
eleph
-0.92
Au
-0.91
hemor
-0.86
¥ŀ
-0.80
ò
-0.78
Gou
-0.78
pione
-0.77
agu
-0.77
srf
-0.77
POSITIVE LOGITS
ney
1.30
ton
1.19
TON
1.14
neys
1.02
NEY
1.02
ny
0.94
lyn
0.91
Madison
0.86
mia
0.85
lin
0.84
Activations Density 0.154%