INDEX
    Explanations

    discussions related to extremism and its impact on perception and behavior

    New Auto-Interp
    Negative Logits
    backward
    -0.15
     Expo
    -0.15
     illegally
    -0.14
    efd
    -0.14
     Enhancement
    -0.14
    spor
    -0.14
    iment
    -0.13
    lÃŃÄį
    -0.13
     Tester
    -0.13
     intrig
    -0.13
    POSITIVE LOGITS
     norms
    0.18
     metrics
    0.17
     incentives
    0.17
     incentiv
    0.17
     metric
    0.17
     feedback
    0.16
    norm
    0.16
    ients
    0.15
     Metric
    0.15
    tsy
    0.15
    Act Density 0.006%

    No Known Activations