INDEX
Explanations
proper names, particularly those of political figures and entities
New Auto-Interp
Negative Logits
aylor
-0.08
ernes
-0.08
andes
-0.07
ulet
-0.07
ushima
-0.07
iglia
-0.07
ople
-0.07
iola
-0.07
apus
-0.07
κοÏį
-0.07
POSITIVE LOGITS
LOY
0.06
Wire
0.06
ationToken
0.05
ê·¼
0.05
431
0.05
512
0.05
εδ
0.05
938
0.05
Caribbean
0.05
pillar
0.05
Activations Density 0.001%