INDEX
Explanations
female first names
references to specific individuals with the name "Guevara."
New Auto-Interp
Negative Logits
atform
-0.79
aceutical
-0.71
practice
-0.70
Agg
-0.69
vertisement
-0.64
acl
-0.63
icum
-0.62
arna
-0.61
etheless
-0.61
acterial
-0.58
POSITIVE LOGITS
lla
1.18
llo
1.18
hler
0.95
ll
0.94
lly
0.91
lette
0.89
hner
0.89
pees
0.85
lder
0.84
hl
0.83
Activations Density 0.014%