INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -1.22
    archiviato
    -0.70
    GIH
    -0.70
     oprot
    -0.63
    makeText
    -0.61
    -0.61
     ffilmiau
    -0.61
    map
    -0.60
    jdbc
    -0.60
     Mike
    -0.60
    POSITIVE LOGITS
     🤣🤣
    1.58
     ftu
    1.49
     hairc
    1.48
     milf
    1.47
     ecru
    1.45
     maneu
    1.40
     affor
    1.36
     ftre
    1.35
     eiffel
    1.33
     fta
    1.33
    Act Density 1.011%

    No Known Activations