INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -0.52
     Serikat
    -0.52
     Équipe
    -0.50
     itſelf
    -0.50
     myſelf
    -0.49
     enfans
    -0.47
     webbplats
    -0.47
     RIPRODUZIONE
    -0.47
    JvmStatic
    -0.47
     purpoſe
    -0.46
    POSITIVE LOGITS
     en
    1.11
     in
    0.90
     En
    0.80
     în
    0.75
     In
    0.73
    En
    0.71
     в
    0.65
    In
    0.61
     IN
    0.60
     EN
    0.58
    Act Density 0.004%

    No Known Activations