INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    Naz
    -0.75
     veterin
    -0.67
     Tome
    -0.67
    ogy
    -0.67
     rapists
    -0.64
    hai
    -0.64
    bara
    -0.62
     Mao
    -0.62
     cous
    -0.61
     Thib
    -0.61
    POSITIVE LOGITS
    atten
    0.78
    op
    0.76
    opol
    0.74
    leeve
    0.72
    ooth
    0.71
    orial
    0.69
    lay
    0.68
    chip
    0.67
    break
    0.66
    meet
    0.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.