INDEX
Explanations
phrases related to physical harm or injury
terms related to physical harm and bodily injuries
New Auto-Interp
Negative Logits
rams
-0.77
resses
-0.75
oise
-0.74
rador
-0.74
nels
-0.73
olan
-0.73
rieg
-0.72
ERAL
-0.72
eer
-0.71
night
-0.70
POSITIVE LOGITS
puter
0.90
irrad
0.88
bodily
0.85
fluids
0.83
ancest
0.80
dexter
0.78
disarm
0.75
incapac
0.73
awa
0.73
injury
0.72
Activations Density 0.022%