INDEX
Explanations
phrases related to violent actions
New Auto-Interp
Negative Logits
*/(
-0.79
ãĥĥãĥĪ
-0.71
Cruise
-0.71
specificity
-0.70
nell
-0.69
fleet
-0.68
picture
-0.68
detail
-0.66
Remastered
-0.65
master
-0.64
POSITIVE LOGITS
utenant
1.15
Angelo
1.08
pton
1.05
ars
1.02
zhou
0.95
jing
0.95
otta
0.93
ying
0.90
cci
0.90
hao
0.88
Activations Density 0.015%