INDEX
    Explanations

    beginnings of compound words

    New Auto-Interp
    Negative Logits
    ad
    1.28
    an
    1.14
    ir
    1.13
    та
    1.09
    z
    1.01
    the
    0.98
    w
    0.98
    u
    0.95
    is
    0.89
    us
    0.89
    POSITIVE LOGITS
    0.68
     bör
    0.63
    0.63
    টি
    0.61
    ↵↵
    0.61
    0.60
    Ι
    0.58
    க்
    0.57
    lüğ
    0.54
    0.54
    Act Density 4.074%

    No Known Activations