INDEX
Explanations
names of individuals, particularly surnames
names of individuals, particularly those associated with commentary or opinions
New Auto-Interp
Negative Logits
graded
-0.79
words
-0.78
grading
-0.78
eat
-0.74
etermined
-0.73
ocaust
-0.72
usable
-0.71
jong
-0.71
spare
-0.70
ournal
-0.70
POSITIVE LOGITS
Weaver
1.24
lings
0.72
¯
0.68
Remastered
0.67
Ashe
0.67
gren
0.67
eers
0.66
ILA
0.65
Widow
0.64
Firefly
0.64
Activations Density 0.007%