INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     it
    -0.09
     they
    -0.07
     we
    -0.07
    "I
    -0.07
     he
    -0.07
    (cells
    -0.06
    itori
    -0.06
     It
    -0.06
     she
    -0.06
     I
    -0.06
    POSITIVE LOGITS
     whose
    0.11
    whose
    0.11
     mutant
    0.07
     Flexible
    0.06
     Whole
    0.06
    _nome
    0.06
     İlk
    0.06
     binds
    0.06
     countless
    0.06
     những
    0.06
    Act Density 0.007%

    No Known Activations