INDEX
Explanations
proper nouns, potentially related to sports or news
New Auto-Interp
Negative Logits
usher
-0.85
engers
-0.77
fri
-0.71
ibel
-0.68
Catal
-0.67
azar
-0.66
Archdemon
-0.63
inoa
-0.62
ossier
-0.62
artney
-0.62
POSITIVE LOGITS
Ort
0.81
pheus
0.81
akov
0.78
anka
0.76
heastern
0.72
stall
0.71
yy
0.70
mond
0.68
craft
0.67
gdala
0.67
Activations Density 11.422%