INDEX
    Explanations

    spock, superman, boy, unitty

    New Auto-Interp
    Negative Logits
     dement
    0.57
     sido
    0.55
    acidad
    0.54
     apoy
    0.52
     vuelta
    0.51
     puta
    0.50
     meestal
    0.49
     standout
    0.49
     juego
    0.48
     wich
    0.48
    POSITIVE LOGITS
    দের
    0.64
     the
    0.57
    0.55
     a
    0.54
    са
    0.52
    the
    0.52
     škole
    0.52
    ي
    0.52
    0.52
    па
    0.51
    Act Density 0.000%

    No Known Activations