INDEX
    Explanations

    negative descriptors and insults related to individuals or groups

    New Auto-Interp
    Negative Logits
    TagMode
    -0.55
    -0.49
    ánd
    -0.47
    ChildIndex
    -0.42
    דו
    -0.42
    iciens
    -0.41
    \|\
    -0.40
    isel
    -0.38
     kend
    -0.38
    dataSet
    -0.38
    POSITIVE LOGITS
     crap
    0.88
     للمعارف
    0.87
    StructEnd
    0.86
    ThroughAttribute
    0.86
    protoc
    0.85
     ProtoMessage
    0.84
     bullshit
    0.83
     morons
    0.82
    发表于
    0.82
     idiotic
    0.81
    Act Density 0.454%

    No Known Activations