INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Wrestling
    -0.07
    .NONE
    -0.07
    security
    -0.07
    -0.07
    ertility
    -0.07
     fertility
    -0.06
    );
    
    ↵
    -0.06
    choices
    -0.06
    女性
    -0.06
    tbody
    -0.06
    POSITIVE LOGITS
     antique
    0.12
     Antique
    0.07
    unk
    0.07
     convention
    0.07
     diamonds
    0.06
    0.06
     право
    0.06
     paras
    0.06
    â
    0.06
     incon
    0.06
    Act Density 0.002%

    No Known Activations