INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ancial
    -0.09
    itiers
    -0.09
    ountain
    -0.08
    ially
    -0.08
    ည်
    -0.08
    mittelt
    -0.07
    incare
    -0.07
    agrance
    -0.07
    ummy
    -0.07
    atherapy
    -0.07
    POSITIVE LOGITS
     birbir
    0.11
     gemeinsam
    0.09
     camar
    0.09
     সবাই
    0.09
     worried
    0.09
     elkaar
    0.08
     teammate
    0.08
     herd
    0.08
     ఇద్ద
    0.08
     осв
    0.08
    Act Density 0.099%

    No Known Activations