INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ς
    2.03
    s
    1.81
    ের
    1.70
    iyor
    1.62
    I
    1.61
    を中心
    1.59
    1.59
    1.54
     但是
    1.52
    を引き
    1.52
    POSITIVE LOGITS
    ية
    2.56
    و
    2.16
     matemático
    2.14
    ة
    2.11
    é
    1.91
    ۹
    1.88
    <bos>
    1.84
    izes
    1.81
     Tots
    1.80
     delanter
    1.78
    Act Density 0.027%

    No Known Activations