INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     remove
    -0.07
    .cleanup
    -0.07
    NOTE
    -0.07
     interruption
    -0.07
    ンテ
    -0.06
     Ernst
    -0.06
     inadequate
    -0.06
     Benchmark
    -0.06
    _Att
    -0.06
     notes
    -0.06
    POSITIVE LOGITS
     Social
    0.11
    social
    0.09
     social
    0.09
    -social
    0.08
    Social
    0.08
     socioeconomic
    0.08
     CSR
    0.08
     gossip
    0.08
     اجتماع
    0.07
    human
    0.07
    Act Density 0.017%

    No Known Activations