INDEX
Explanations
mentions of physical or emotional torment
instances of the word "tor" and related forms in various contexts
New Auto-Interp
Negative Logits
sheets
-0.73
compliance
-0.68
enforcement
-0.68
Keefe
-0.65
WER
-0.64
ortment
-0.64
Dakota
-0.63
endif
-0.63
)=(
-0.62
enegger
-0.62
POSITIVE LOGITS
onto
1.14
rent
0.90
mented
0.89
tor
0.86
rance
0.84
atra
0.82
ques
0.79
seys
0.76
roying
0.76
chest
0.75
Activations Density 0.010%