INDEX
Explanations
names of individuals
mentions of specific individuals, particularly in relation to their popularity or notoriety
New Auto-Interp
Negative Logits
ãĥ£
-0.77
ãĥ¯ãĥ³
-0.76
ãĥĢ
-0.75
pmwiki
-0.74
ãĥł
-0.72
stood
-0.69
CENT
-0.67
pron
-0.67
ãĤŃ
-0.66
onyms
-0.66
POSITIVE LOGITS
Webb
1.29
Dixon
0.89
inson
0.86
swick
0.83
owship
0.82
icz
0.82
irth
0.80
zos
0.78
Weld
0.77
Hutch
0.76
Activations Density 0.008%