INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    llo
    -0.69
     grou
    -0.65
    ername
    -0.63
     remark
    -0.60
     comma
    -0.60
     forbidden
    -0.60
     tem
    -0.59
     prohibited
    -0.58
     prank
    -0.58
     spam
    -0.57
    POSITIVE LOGITS
    redients
    0.76
    soDeliveryDate
    0.74
    acles
    0.73
    asta
    0.67
    ãĤ´ãĥ³
    0.65
    fixes
    0.64
     ingred
    0.63
    abe
    0.63
    ecd
    0.62
    CD
    0.62
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.