INDEX
Explanations
mentions of a specific individual's name in a negative context
words related to an individual's name or identity
New Auto-Interp
Negative Logits
schild
-0.78
space
-0.76
enegger
-0.74
line
-0.72
starter
-0.69
sheet
-0.69
tal
-0.69
birds
-0.66
lings
-0.65
hawk
-0.64
POSITIVE LOGITS
ñ
0.94
edia
0.90
pered
0.89
cess
0.88
uthor
0.84
pload
0.83
odcast
0.81
pa
0.81
resa
0.80
apa
0.80
Activations Density 0.011%