INDEX
    Explanations

    color codes

    New Auto-Interp
    Negative Logits
     Ro
    -0.09
    -speaking
    -0.08
     ruin
    -0.07
     ro
    -0.07
    -0.07
    _send
    -0.07
     зелен
    -0.07
     repay
    -0.07
     repayment
    -0.07
     #{@
    -0.07
    POSITIVE LOGITS
    asya
    0.08
     estética
    0.08
    Spider
    0.08
     nisu
    0.08
     programma
    0.08
     Erotic
    0.07
    programma
    0.07
     contraseña
    0.07
     Exhib
    0.07
     Carrera
    0.07
    Act Density 0.001%

    No Known Activations