INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     racist
    -0.07
     welding
    -0.07
     Sexe
    -0.06
    .sent
    -0.06
     ragaz
    -0.06
    ]initWithFrame
    -0.06
     widest
    -0.06
     tester
    -0.06
    ()];↵
    -0.06
    ILITY
    -0.06
    POSITIVE LOGITS
    \AppData
    0.06
     sống
    0.06
    192
    0.06
    min
    0.06
    0.06
    .pos
    0.06
     boyut
    0.06
    ansi
    0.06
    .feedback
    0.06
    lij
    0.06
    Act Density 0.009%

    No Known Activations