INDEX
    Explanations

    contrasting traditional with new

    New Auto-Interp
    Negative Logits
    lisi
    0.37
    Param
    0.35
     behest
    0.35
     hatta
    0.35
     auch
    0.35
    paramet
    0.35
     multip
    0.34
    Regardless
    0.34
     bądź
    0.33
    bele
    0.33
    POSITIVE LOGITS
     Previously
    1.08
     previously
    1.03
    Previously
    0.98
     Whereas
    0.97
    previously
    0.96
    以前
    0.95
    従来の
    0.95
    従来
    0.93
     Unlike
    0.92
     traditionally
    0.92
    Act Density 0.275%

    No Known Activations