INDEX
    Explanations

    exploitative or offensive content

    New Auto-Interp
    Negative Logits
     slightly
    0.52
     একটু
    0.46
     facilitation
    0.45
     కొన్ని
    0.44
     ඔබේ
    0.43
     слегка
    0.43
     :)
    0.43
     légèrement
    0.43
     facilitate
    0.43
     trochu
    0.43
    POSITIVE LOGITS
    Major
    0.51
    No
    0.44
    0.43
    major
    0.43
    MAJOR
    0.43
    ieving
    0.42
    Initial
    0.42
    THIS
    0.41
    Finally
    0.41
    THE
    0.40
    Act Density 1.747%

    No Known Activations