INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ますが
    2.20
    ます
    2.14
    nél
    1.77
    1.71
    nál
    1.66
    1.63
    tile
    1.57
    nahme
    1.46
     گیری
    1.46
    tar
    1.45
    POSITIVE LOGITS
     stains
    1.68
    🧼
    1.68
     cleanser
    1.63
     cleansing
    1.60
     vint
    1.56
     errands
    1.55
     washing
    1.54
    liness
    1.54
     sanitation
    1.54
     cleans
    1.52
    Act Density 0.116%

    No Known Activations