INDEX
Explanations
instances of individuals taking action to assist others in distress
New Auto-Interp
Negative Logits
jom
-0.07
że
-0.07
anki
-0.07
egt
-0.07
IALIZED
-0.07
aldi
-0.07
ä¼ı
-0.07
ottes
-0.07
usercontent
-0.07
.hwp
-0.07
POSITIVE LOGITS
intervention
0.11
rescue
0.10
helping
0.10
intervene
0.10
resc
0.10
help
0.10
interventions
0.09
interven
0.09
intervened
0.09
Intervention
0.08
Activations Density 0.081%