INDEX
    Explanations

    noun followed by verb/preposition

    New Auto-Interp
    Negative Logits
    :
    1.15
    the
    1.02
    The
    0.72
    ه
    0.65
    のカ
    0.63
    ;
    0.63
    four
    0.63
     the
    0.62
    0.62
    :“
    0.61
    POSITIVE LOGITS
    0.59
     I
    0.58
    ILL
    0.58
     Jahr
    0.58
     on
    0.51
     Bezirk
    0.51
    ON
    0.49
    0.49
     Adoles
    0.48
     Bezir
    0.48
    Act Density 1.303%

    No Known Activations