INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    olis
    -0.66
    ersive
    -0.63
    quished
    -0.60
    ostics
    -0.60
    chall
    -0.60
    ospace
    -0.60
    igraph
    -0.60
    fman
    -0.60
    onite
    -0.60
    uve
    -0.60
    POSITIVE LOGITS
    59
    1.05
    56
    1.00
    58
    1.00
    51
    0.99
    09
    0.99
    54
    0.98
    08
    0.98
    05
    0.98
    53
    0.97
    55
    0.97
    Act Density 0.033%

    No Known Activations