INDEX
    Explanations

    toxicity/exacerb

    New Auto-Interp
    Negative Logits
     exaggeration
    -0.78
     exaggerate
    -0.73
     exagger
    -0.66
     exag
    -0.61
     exager
    -0.60
     exaggerated
    -0.59
    Spoljašnje
    -0.57
    CodedInputStream
    -0.57
     exaggerating
    -0.55
     Waray
    -0.54
    POSITIVE LOGITS
     חיצוניים
    0.71
    rungsseite
    0.66
     ujednoznacz
    0.63
    tagHelperRunner
    0.62
    :✨
    0.59
    Gön
    0.53
    Personensuche
    0.52
     فريبيس
    0.52
     ProtoMessage
    0.51
     AssemblyCulture
    0.50
    Act Density 0.031%

    No Known Activations