INDEX
    Explanations

    references to technological interventions and their implications in society

    New Auto-Interp
    Negative Logits
    isters
    -0.16
    nist
    -0.15
    uitka
    -0.15
     Nielsen
    -0.15
     packing
    -0.15
     hor
    -0.15
    hor
    -0.15
     Goldman
    -0.14
     Packing
    -0.14
    otes
    -0.14
    POSITIVE LOGITS
    ebra
    0.16
    _dash
    0.15
    celik
    0.15
    zdy
    0.15
    ameleon
    0.15
    urry
    0.14
    ovÃŃd
    0.14
    eyJ
    0.14
    ialect
    0.14
    .twig
    0.14
    Act Density 0.025%

    No Known Activations