INDEX
    Explanations

    words indicating completion or finality

    New Auto-Interp
    Negative Logits
    lvl
    -0.58
    ãĥ«
    -0.57
    obal
    -0.54
    aux
    -0.54
    alky
    -0.53
    hops
    -0.52
    ols
    -0.51
    urs
    -0.50
    ãĥİ
    -0.49
    KING
    -0.49
    POSITIVE LOGITS
    .[
    1.00
    !.
    0.95
    .—
    0.93
    ;
    0.93
    ,—
    0.92
    !
    0.91
    !,
    0.90
    .(
    0.90
    .
    0.88
     ;)
    0.88
    Act Density 0.641%

    No Known Activations