INDEX
Explanations
names of individuals
proper nouns and specific names related to individuals or specific entities
New Auto-Interp
Negative Logits
berman
-0.77
Fathers
-0.67
onut
-0.66
omez
-0.63
dating
-0.61
pseudonym
-0.59
İĭ
-0.59
opez
-0.59
emort
-0.58
cab
-0.58
POSITIVE LOGITS
IELD
0.94
âķIJâķIJ
0.87
IFT
0.85
ments
0.84
adow
0.82
ings
0.82
resh
0.81
itic
0.77
adows
0.75
aun
0.73
Activations Density 0.008%