INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     Mehran
    -0.68
    IDENT
    -0.67
    IVERS
    -0.67
    CAST
    -0.66
    NER
    -0.65
    Reviewer
    -0.63
    PAR
    -0.63
    CHAR
    -0.63
     befriend
    -0.62
     Plot
    -0.62
    POSITIVE LOGITS
    rums
    0.70
    theless
    0.70
    lain
    0.70
    oliath
    0.69
    arta
    0.69
    iku
    0.68
    bm
    0.68
    thur
    0.67
    ly
    0.67
    esty
    0.66
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.