INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ¿½
    -2.38
    ĺ
    -2.28
    Ļª
    -2.17
    Ļ
    -2.10
    ĥ½
    -2.10
    ĨĴ
    -2.09
    »¿
    -1.99
    ŀ
    -1.86
    ı
    -1.86
    į
    -1.82
    POSITIVE LOGITS
    ieux
    2.00
    enstein
    1.92
    ucci
    1.90
    ilee
    1.89
    ulence
    1.89
    opan
    1.83
    issance
    1.83
    pool
    1.76
    ulent
    1.75
    stein
    1.74
    Act Density 0.009%

    No Known Activations