INDEX
    Explanations

    phrases indicating beneficial actions or contributions

    New Auto-Interp
    Negative Logits
    仲
    -0.16
    ãĥ³ãĤ°ãĥ«
    -0.15
    à¥Īश
    -0.15
    Ðĥ
    -0.14
    ivan
    -0.14
    assis
    -0.14
    actus
    -0.14
    anco
    -0.14
    αÏĤ
    -0.14
    tiler
    -0.14
    POSITIVE LOGITS
    ilon
    0.17
     ÏĥÏħμβ
    0.15
    imir
    0.15
    intendent
    0.14
    avir
    0.14
    -inline
    0.14
    ifo
    0.14
    unta
    0.13
    jÃŃ
    0.13
    vig
    0.13
    Act Density 0.272%

    No Known Activations