INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ancest
    -0.77
    ymes
    -0.73
    ancy
    -0.69
    detail
    -0.66
    Magikarp
    -0.65
     unpre
    -0.63
    lihood
    -0.61
    heid
    -0.60
    icular
    -0.60
     Rooney
    -0.59
    POSITIVE LOGITS
    å¹
    0.71
    wark
    0.69
    -'
    0.68
    iyah
    0.68
     Reconstruction
    0.67
    yrinth
    0.66
     Olympics
    0.65
    é¾
    0.65
    bit
    0.64
    irection
    0.63
    Act Density 0.023%

    No Known Activations