INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     spoken
    -0.06
    _est
    -0.06
     mat
    -0.06
    하고
    -0.06
    šem
    -0.06
     الشر
    -0.06
    mode
    -0.06
    .box
    -0.06
     Burst
    -0.06
     tribes
    -0.06
    POSITIVE LOGITS
     Tea
    0.07
     Dad
    0.07
     planner
    0.07
    noon
    0.07
    ]]
    ↵
    0.06
    دان
    0.06
     отлич
    0.06
     dad
    0.06
     Ballet
    0.06
     Lori
    0.06
    Act Density 0.002%

    No Known Activations