INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    stå
    -0.17
    ario
    -0.16
    ial
    -0.15
    pert
    -0.15
    eshire
    -0.14
    iangle
    -0.14
    allery
    -0.14
     bert
    -0.14
     exile
    -0.13
    411
    -0.13
    POSITIVE LOGITS
    apest
    0.29
    ovice
    0.19
    bud
    0.17
    hist
    0.17
    olic
    0.17
     Bud
    0.17
    ges
    0.16
    eto
    0.15
    enko
    0.15
    weis
    0.15
    Act Density 0.007%

    No Known Activations