INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hjem
    -0.07
     erotici
    -0.07
    мент
    -0.06
    .Points
    -0.06
    ufreq
    -0.06
     haci
    -0.06
    acent
    -0.06
     Mats
    -0.06
     SAC
    -0.06
    icot
    -0.06
    POSITIVE LOGITS
     additional
    0.07
     prognosis
    0.07
     pert
    0.07
     airplane
    0.07
     Epidemi
    0.07
    ını
    0.07
     kernel
    0.07
    \">\
    0.07
    Publisher
    0.07
    OCK
    0.07
    Act Density 0.000%

    No Known Activations