INDEX
Explanations
phrases indicating violent actions occurring in a continuous context
New Auto-Interp
Negative Logits
Ambro
-0.82
Random
-0.64
Genocide
-0.62
Ukrain
-0.62
ウス
-0.60
Verd
-0.59
Rye
-0.58
Pwr
-0.58
Canaver
-0.58
Clever
-0.57
POSITIVE LOGITS
jour
0.81
serving
0.76
carry
0.71
enjoying
0.66
ploy
0.66
GO
0.65
esc
0.64
oping
0.63
bicy
0.63
stationed
0.63
Activations Density 0.026%