INDEX
    Explanations

    protected characteristics and hatred

    New Auto-Interp
    Negative Logits
     percents
    0.44
     genders
    0.41
     সড়ক
    0.40
     procent
    0.39
     Haem
    0.39
     personalities
    0.38
    ğinde
    0.38
     BIO
    0.38
     createElement
    0.38
     etx
    0.38
    POSITIVE LOGITS
    protected
    0.67
     protected
    0.64
    Protected
    0.58
     hatred
    0.55
     unfairly
    0.52
     Protected
    0.52
    immutable
    0.49
     unjustly
    0.49
     grounds
    0.47
     religion
    0.47
    Act Density 0.025%

    No Known Activations