INDEX
    Explanations

    phrases that indicate prior actions or events

    New Auto-Interp
    Negative Logits
    KommentareTeilen
    -0.93
    gdx
    -0.85
     habet
    -0.85
     Talley
    -0.72
     Messer
    -0.69
    genstein
    -0.68
    ſelf
    -0.68
     følge
    -0.67
    gany
    -0.66
    ณ์
    -0.66
    POSITIVE LOGITS
    before
    1.87
     before
    1.86
     Before
    1.81
    BEFORE
    1.81
     BEFORE
    1.74
    Before
    1.72
     sebelum
    1.48
     innan
    1.39
     befo
    1.37
     πριν
    1.33
    Act Density 0.105%

    No Known Activations