INDEX
    Explanations

    phrases related to violent actions

    New Auto-Interp
    Negative Logits
    */(
    -0.79
    ãĥĥãĥĪ
    -0.71
     Cruise
    -0.71
     specificity
    -0.70
    nell
    -0.69
    fleet
    -0.68
    picture
    -0.68
    detail
    -0.66
     Remastered
    -0.65
    master
    -0.64
    POSITIVE LOGITS
    utenant
    1.15
    Angelo
    1.08
    pton
    1.05
    ars
    1.02
    zhou
    0.95
    jing
    0.95
    otta
    0.93
    ying
    0.90
    cci
    0.90
    hao
    0.88
    Act Density 0.015%

    No Known Activations