INDEX
    Explanations

    instances of the word "more"

    New Auto-Interp
    Negative Logits
    oved
    -0.08
     Ped
    -0.06
    omba
    -0.06
    agma
    -0.06
     rud
    -0.06
    eda
    -0.06
    oa
    -0.06
    ,
    -0.06
     tr
    -0.05
    elin
    -0.05
    POSITIVE LOGITS
    burgh
    0.08
     poil
    0.08
    pok
    0.07
    _Tis
    0.07
    åĭ
    0.07
     cazzo
    0.07
    _DECLS
    0.07
    assel
    0.07
    Äįel
    0.07
    šli
    0.07
    Act Density 0.001%

    No Known Activations