INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    Choice
    -0.71
    Chair
    -0.68
    ogens
    -0.67
     Vest
    -0.64
     univers
    -0.64
    Nusra
    -0.63
     Printing
    -0.62
     Checking
    -0.61
    Arg
    -0.61
    onom
    -0.61
    POSITIVE LOGITS
    antz
    0.82
     incompet
    0.78
    renheit
    0.66
    tg
    0.65
    aries
    0.65
     thirsty
    0.64
    ateur
    0.62
    irie
    0.61
     badly
    0.60
    ushima
    0.60
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.