INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     drying
    -0.06
    YS
    -0.06
    ={[↵
    -0.06
    Johnson
    -0.06
     Trees
    -0.06
    act
    -0.06
     clot
    -0.06
     Bryan
    -0.06
     Bootstrap
    -0.06
     آس
    -0.06
    POSITIVE LOGITS
     perfume
    0.12
     poetry
    0.07
     лю
    0.07
    rape
    0.07
     música
    0.07
     perf
    0.06
     tobacco
    0.06
     позвол
    0.06
    prm
    0.06
     ev
    0.06
    Act Density 0.003%

    No Known Activations