INDEX
Explanations
names of individuals
proper names, particularly those of individuals
New Auto-Interp
Negative Logits
ModLoader
-0.76
etheless
-0.74
LEASE
-0.72
Melania
-0.68
theless
-0.67
underwater
-0.66
Millennials
-0.66
Oprah
-0.65
UTERS
-0.65
Alibaba
-0.64
POSITIVE LOGITS
zen
1.02
atz
0.96
acci
0.96
burn
0.95
utsch
0.94
inger
0.92
itz
0.92
inski
0.92
ham
0.91
owski
0.91
Activations Density 0.331%