INDEX
Explanations
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
ï¸ı
-0.77
Compass
-0.68
ĸļ
-0.68
UGE
-0.67
Totem
-0.64
netflix
-0.63
Orient
-0.62
usercontent
-0.60
Esk
-0.59
lvl
-0.58
POSITIVE LOGITS
elli
0.70
kov
0.69
wagen
0.68
amins
0.65
ovic
0.64
Å¡
0.64
igi
0.62
ucci
0.60
hof
0.59
ello
0.58
Activations Density 0.104%