INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     RAILROAD
    -0.80
     DONOR
    -0.61
     HONORABLE
    -0.61
     vendar
    -0.60
    municipi
    -0.60
     WALTZ
    -0.60
     CONDUIT
    -0.59
    Secara
    -0.59
    zwungen
    -0.58
    Whoosh
    -0.58
    POSITIVE LOGITS
     tutt
    0.73
     parteci
    0.73
     abbra
    0.69
     indietro
    0.67
     fortun
    0.64
     déput
    0.64
     UNT
    0.62
     dietro
    0.62
     nemmeno
    0.62
     tranquillo
    0.61
    Act Density 0.058%

    No Known Activations