INDEX
Explanations
references to slavery
New Auto-Interp
Negative Logits
thenReturn
-0.47
"><
-0.47
?<
-0.46
Michelle
-0.46
<
-0.45
Pic
-0.45
Designated
-0.44
<
-0.43
box
-0.43
Neil
-0.43
POSITIVE LOGITS
slavery
2.08
Slavery
1.88
slavery
1.70
ensla
0.96
esclavos
0.94
escla
0.91
escra
0.89
Sla
0.89
HasFactory
0.84
enslaved
0.73
Activations Density 0.004%