INDEX
Explanations
phrases related to killing or death
New Auto-Interp
Negative Logits
onto
-0.19
vid
-0.17
onto
-0.15
land
-0.15
/out
-0.15
ulled
-0.15
oi
-0.15
eward
-0.14
elect
-0.14
iland
-0.14
POSITIVE LOGITS
/disable
0.20
ibri
0.19
spree
0.19
deer
0.19
çİ°åľº
0.17
joy
0.17
switch
0.16
æĪ
0.16
abyrinth
0.16
à¥ľ
0.16
Activations Density 0.057%