INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    n
    2.00
    ness
    1.97
    nya
    1.97
    l
    1.85
    م
    1.56
    nar
    1.47
    r
    1.47
    m
    1.47
    ly
    1.46
    side
    1.44
    POSITIVE LOGITS
    pp
    1.61
    ñ
    1.55
    ft
    1.50
    pped
    1.46
    pper
    1.44
    ïne
    1.44
    ppes
    1.42
    ́
    1.33
    ff
    1.32
    xt
    1.28
    Act Density 1.097%

    No Known Activations