INDEX
    Explanations

    phrases that suggest causation or responsibility related to societal issues

    New Auto-Interp
    Negative Logits
     implications
    -0.17
     DependencyProperty
    -0.16
     repercussions
    -0.15
    oise
    -0.15
    impact
    -0.14
     DEFINE
    -0.14
    าศ
    -0.14
    amba
    -0.14
    merce
    -0.14
    lip
    -0.14
    POSITIVE LOGITS
     why
    0.38
    why
    0.29
     recent
    0.26
     Why
    0.25
     observed
    0.25
    为ä»Ģä¹Ī
    0.24
     success
    0.24
    Why
    0.24
     WHY
    0.23
     поÑĩемÑĥ
    0.23
    Act Density 0.241%

    No Known Activations