INDEX
Explanations
proper names of people, potentially celebrities
mentions of specific names and terms related to individuals and products
New Auto-Interp
Negative Logits
amaz
-0.82
umar
-0.76
pling
-0.75
plings
-0.74
ktop
-0.72
Horus
-0.72
metry
-0.71
owship
-0.69
holders
-0.69
weed
-0.68
POSITIVE LOGITS
ework
0.77
issance
0.76
ype
0.76
ãĥĥãĥĪ
0.74
ingham
0.72
orate
0.71
ãĤ¡
0.70
eful
0.69
esy
0.69
eur
0.69
Activations Density 0.018%