INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     exprim
    -0.07
    ant
    -0.07
    let
    -0.07
     let
    -0.07
     nourrit
    -0.07
     abrazo
    -0.07
     Pops
    -0.07
     rounds
    -0.07
     Jacob
    -0.06
     str
    -0.06
    POSITIVE LOGITS
    (single
    0.13
     einzigen
    0.12
    .Single
    0.12
    -single
    0.11
     einzige
    0.11
     naanị
    0.11
    _single
    0.10
     Single
    0.10
    ingle
    0.10
    .single
    0.10
    Act Density 0.030%

    No Known Activations