INDEX
Explanations
names of people being targeted or affected by violence or injustice
New Auto-Interp
Negative Logits
phis
-1.00
s
-0.82
acular
-0.82
ivas
-0.80
ivery
-0.80
enium
-0.80
ertodd
-0.79
achusetts
-0.79
imates
-0.78
neapolis
-0.78
POSITIVE LOGITS
zz
0.95
zza
0.88
ÄŁ
0.86
ÅŁ
0.85
gger
0.83
ñ
0.82
ça
0.80
FORE
0.79
zzi
0.76
legates
0.74
Activations Density 0.043%