INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    {
    -2.39
    1
    -1.90
     estrictamente
    -1.88
    2
    -1.84
    -1.79
     was
    -1.70
    看到了
    -1.68
    and
    -1.63
     vorsichtig
    -1.63
     automáticamente
    -1.62
    POSITIVE LOGITS
     and
    2.14
    1.91
     februari
    1.85
     lavabo
    1.80
     brune
    1.80
    1.79
     دیگران
    1.73
     golfe
    1.71
     farmacia
    1.69
     tigre
    1.68
    Act Density 0.003%

    No Known Activations