INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ?,
    1.05
    +,
    0.95
    !,
    0.90
    ・・・。
    0.89
    ..,
    0.89
    (),
    0.88
    ٸ
    0.84
    。。。。
    0.82
    =?,
    0.81
    ؟.
    0.79
    POSITIVE LOGITS
     although
    1.26
     moreover
    1.23
     whereupon
    1.06
     fortunately
    1.00
     whereas
    0.97
     however
    0.97
     furthermore
    0.95
     let
    0.95
     albeit
    0.95
    然而
    0.93
    Act Density 0.042%

    No Known Activations