INDEX
Explanations
names of individuals in positions of power or prominence
proper nouns, specifically names of people or entities
New Auto-Interp
Negative Logits
âĶĢâĶĢ
-0.86
LEASE
-0.79
âĸ¬
-0.76
sburgh
-0.67
EEE
-0.64
VICE
-0.62
acebook
-0.62
LCS
-0.62
··
-0.61
FontSize
-0.60
POSITIVE LOGITS
iani
0.99
zynski
0.94
hair
0.94
iman
0.92
hai
0.91
endi
0.88
hari
0.88
hart
0.87
oub
0.86
kil
0.85
Activations Density 0.253%