INDEX
Explanations
names of individuals
proper nouns and names, particularly related to individuals and cultural references
New Auto-Interp
Negative Logits
FW
-0.79
ted
-0.77
tf
-0.73
hered
-0.73
self
-0.71
Continental
-0.69
lay
-0.68
Insect
-0.67
sch
-0.66
sheet
-0.64
POSITIVE LOGITS
aji
1.30
pora
1.10
ajor
0.89
ison
0.89
ision
0.88
veyard
0.86
ournal
0.86
igrants
0.83
ilee
0.81
irez
0.78
Activations Density 0.016%