INDEX
Explanations
words related to torture and physical abuse
New Auto-Interp
Negative Logits
donor
-0.97
earance
-0.82
BY
-0.79
donation
-0.78
iod
-0.77
ership
-0.75
Prospect
-0.75
magnification
-0.74
Ü
-0.73
ACP
-0.72
POSITIVE LOGITS
urous
1.37
oise
1.35
illas
1.11
anamo
0.97
ured
0.96
aste
0.95
teenth
0.95
uous
0.94
uses
0.92
ificate
0.92
Activations Density 0.796%