INDEX
    Explanations

    prepositions

    New Auto-Interp
    Negative Logits
    =wx
    -0.08
    _df
    -0.08
    ово
    -0.07
     Advocate
    -0.07
    자를
    -0.07
     accumulate
    -0.07
    workspace
    -0.07
    ships
    -0.07
    uddenly
    -0.07
    -0.07
    POSITIVE LOGITS
    0.08
    <K
    0.07
    配上
    0.06
    kont
    0.06
     Bah
    0.06
    0.06
    kker
    0.06
     '&
    0.06
     stew
    0.06
    >;↵↵
    0.06
    Act Density 0.195%

    No Known Activations