INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    +#+#
    -0.83
     AssemblyCulture
    -0.77
    ]--;
    -0.77
     acorn
    -0.76
    Cześć
    -0.76
    ?—
    -0.76
     bonjour
    -0.75
    KommentareTeilen
    -0.74
     domestiques
    -0.74
    ?''
    -0.74
    POSITIVE LOGITS
    !!!
    0.81
     ??
    0.67
     !!!
    0.65
    custom
    0.63
    ***
    0.60
    ???
    0.56
    BufferException
    0.56
     ???
    0.55
    !!
    0.54
     custom
    0.52
    Act Density 0.114%

    No Known Activations