INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    arium
    -0.76
    Fine
    -0.74
    Dam
    -0.71
    ourning
    -0.70
    ãĤ¼ãĤ¦ãĤ¹
    -0.70
    undo
    -0.70
    orest
    -0.70
    uture
    -0.69
    rosso
    -0.68
    urga
    -0.67
    POSITIVE LOGITS
    zb
    0.80
    henko
    0.72
     rifle
    0.67
    sth
    0.66
    hester
    0.66
    ussia
    0.65
     Ballistic
    0.65
    fters
    0.65
     explosives
    0.63
    hops
    0.63
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.