INDEX
Explanations
words and phrases associated with descriptions of violent events and their consequences
New Auto-Interp
Negative Logits
"/";↵
-0.15
@"";↵
-0.14
=@
-0.13
à¤Ĩà¤ĸ
-0.13
../../../
-0.13
'/';↵
-0.13
@"\
-0.13
ä½ķãģĭ
-0.12
sterol
-0.12
SCI
-0.12
POSITIVE LOGITS
"
0.43
'
0.38
«
0.31
“
0.30
\"
0.29
`
0.28
``
0.27
ãĢĮ
0.27
‘
0.26
''
0.24
Activations Density 0.225%