INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Xavier
    -0.08
    .Native
    -0.07
     one
    -0.07
     ONE
    -0.07
     centerpiece
    -0.07
     Highest
    -0.07
    /max
    -0.06
     One
    -0.06
     McA
    -0.06
     одной
    -0.06
    POSITIVE LOGITS
     Bur
    0.17
    Bur
    0.15
     bur
    0.13
    burg
    0.13
     Burke
    0.12
    bury
    0.11
    bur
    0.10
     Burton
    0.09
    berg
    0.09
    enburg
    0.09
    Act Density 0.017%

    No Known Activations