INDEX
    Explanations

    instances of negative or critical language

    parts of compound words

    New Auto-Interp
    Negative Logits
    ftagPool
    -0.64
    gradu
    -0.50
    -0.48
    copg
    -0.45
     bağı
    -0.45
    cessite
    -0.45
    jsii
    -0.45
    addCriterion
    -0.45
    нгред
    -0.45
    ésult
    -0.44
    POSITIVE LOGITS
    SuspendLayout
    0.46
    NameInMap
    0.42
    ]")]
    0.40
     يتيمه
    0.40
    LookAnd
    0.39
     Karlsson
    0.39
    calyptic
    0.39
     चीज़ों
    0.38
     sih
    0.38
     "../../../../
    0.36
    Act Density 0.060%

    No Known Activations