INDEX
    Explanations

    phrases that indicate societal and environmental issues

    New Auto-Interp
    Negative Logits
    actionDate
    -0.15
    åζ
    -0.14
    aven
    -0.14
    Ñĥма
    -0.14
    446
    -0.14
    olt
    -0.14
    avery
    -0.14
    iggins
    -0.13
     åζ
    -0.13
    ilar
    -0.13
    POSITIVE LOGITS
    0.24
     '
    0.23
    0.22
     "
    0.20
     «
    0.18
     dreaded
    0.16
     `
    0.16
     â
    0.15
     hidden
    0.15
     \"
    0.15
    Act Density 0.255%

    No Known Activations