INDEX
Explanations
names, likely of people
the mention of specific names or identities
New Auto-Interp
Negative Logits
nces
-0.80
ILY
-0.79
minist
-0.78
rance
-0.76
scl
-0.75
gged
-0.75
cale
-0.74
gently
-0.74
INESS
-0.74
backer
-0.74
POSITIVE LOGITS
atalie
0.80
Devi
0.78
Loren
0.76
eus
0.74
Brus
0.73
itri
0.67
Nik
0.66
udeau
0.65
Port
0.65
Natalie
0.64
Activations Density 0.043%