INDEX
Explanations
mentions of suffering and related concepts, such as pain, exploitation, and freedom
references to suffering and its impact on individuals and society
New Auto-Interp
Negative Logits
sure
-0.67
sports
-0.67
Collider
-0.66
latch
-0.65
cluding
-0.63
clude
-0.62
ioch
-0.62
lev
-0.62
leans
-0.61
ouncing
-0.60
POSITIVE LOGITS
inflicted
0.90
lehem
0.83
Nadu
0.77
endured
0.77
hani
0.76
setbacks
0.74
lessly
0.74
nesses
0.73
suffered
0.73
suffering
0.72
Activations Density 0.025%