INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Schmid
    -0.54
    spedes
    -0.53
    ising
    -0.53
     Fong
    -0.51
    avas
    -0.49
     Fabian
    -0.49
    uris
    -0.49
    amond
    -0.48
    asan
    -0.48
    idal
    -0.48
    POSITIVE LOGITS
    Take
    1.56
     take
    1.54
    take
    1.52
     Take
    1.49
     TAKE
    1.32
    TAKE
    1.22
     takes
    0.92
    takes
    0.91
     taken
    0.83
    taken
    0.83
    Act Density 0.017%

    No Known Activations