INDEX
    Explanations

    phrases with the prefix "le-" followed by numbers

    New Auto-Interp
    Negative Logits
    ãĥ¼ãĥĨãĤ£
    -0.85
    ����
    -0.74
    aneers
    -0.69
    âĶģ
    -0.68
    ĸļ
    -0.66
    ãģĨ
    -0.65
    ilities
    -0.64
    ãĥ¼ãĥĨ
    -0.63
    acca
    -0.63
    ruary
    -0.62
    POSITIVE LOGITS
    opard
    1.23
    isure
    1.06
    icester
    1.00
    vered
    1.00
    vity
    1.00
    pid
    0.99
    gged
    0.99
    gging
    0.98
    yton
    0.98
    mons
    0.97
    Act Density 0.018%

    No Known Activations