INDEX
    Explanations

    phrases that indicate alternatives or different options

    New Auto-Interp
    Negative Logits
     boyunca
    -0.42
     obtenido
    -0.42
     leeftijd
    -0.41
     těch
    -0.40
     zejména
    -0.40
     jednotliv
    -0.40
     muertes
    -0.39
     rodillas
    -0.38
     pokud
    -0.38
     jäm
    -0.38
    POSITIVE LOGITS
     another
    1.11
    another
    1.09
    Another
    0.99
     Another
    0.94
     ANOTHER
    0.93
     Otro
    0.79
    Otra
    0.79
     Otra
    0.77
    別の
    0.73
     otra
    0.71
    Act Density 0.010%

    No Known Activations