INDEX
Explanations
proper nouns, particularly first names
names of individuals
names of people mentioned in relation to events or opinions
New Auto-Interp
Negative Logits
ãĥ¼ãĥ³
-0.85
igion
-0.79
td
-0.78
ruction
-0.75
ague
-0.75
ebook
-0.72
ãĥ¢
-0.72
mediated
-0.68
diarr
-0.66
itri
-0.65
POSITIVE LOGITS
Hank
0.88
enstein
0.73
heon
0.73
keye
0.72
ukong
0.72
iamond
0.70
Hawks
0.70
alach
0.69
buster
0.68
ilda
0.68
Activations Density 0.016%