INDEX
    Explanations

    punctuation

    New Auto-Interp
    Negative Logits
    -0.07
    704
    -0.07
     Bent
    -0.06
    ใคร
    -0.06
    -help
    -0.06
    _foot
    -0.06
     baktı
    -0.06
    ifier
    -0.06
    -0.06
    -0.06
    POSITIVE LOGITS
    ystery
    0.06
    0.06
    .services
    0.06
     homophobic
    0.06
    광역시
    0.06
     concl
    0.06
    aryawan
    0.06
    icontains
    0.06
     squares
    0.06
     overlays
    0.06
    Act Density 0.028%

    No Known Activations