INDEX
Explanations
references to suffering and related topics
references to suffering and its impacts on individuals and society
New Auto-Interp
Negative Logits
sure
-0.65
eton
-0.64
sports
-0.62
Mafia
-0.60
Shots
-0.60
Collider
-0.59
lev
-0.59
clear
-0.59
uren
-0.59
smoking
-0.58
POSITIVE LOGITS
inflicted
0.91
lehem
0.80
Palest
0.79
ansson
0.77
lessly
0.76
endured
0.74
Nadu
0.74
hani
0.73
suffered
0.73
horribly
0.72
Activations Density 0.021%