INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Pathways
    -0.87
     rois
    -0.86
     ert
    -0.85
     使っ
    -0.85
     auffi
    -0.84
    pppp
    -0.82
     テープ
    -0.82
    passengers
    -0.81
     melody
    -0.81
     Discrete
    -0.81
    POSITIVE LOGITS
     fancy
    2.11
    fancy
    1.83
     Fancy
    1.66
    Fancy
    1.59
    pants
    1.27
    work
    1.15
     fancies
    1.15
     pants
    1.05
     fanciful
    1.04
    1.02
    Act Density 0.011%

    No Known Activations