INDEX
Explanations
proper nouns and names, particularly those related to political figures
references to specific names and organizations, particularly individuals and the CPS (Child Protective Services)
New Auto-Interp
Negative Logits
prus
-0.75
lette
-0.74
riors
-0.71
rylic
-0.70
ights
-0.68
icious
-0.66
grabs
-0.62
ctl
-0.62
omics
-0.62
rise
-0.61
POSITIVE LOGITS
deen
0.83
Pengu
0.82
arsh
0.81
Ans
0.75
heim
0.74
illon
0.72
arette
0.71
liam
0.71
wered
0.71
Spiegel
0.70
Activations Density 0.103%