INDEX
    Explanations

    references to specific individuals involved in reporting or activism

    New Auto-Interp
    Negative Logits
    ìķ½
    -0.15
    rox
    -0.14
    ów
    -0.14
    ój
    -0.14
    atak
    -0.13
    ÑĤÑĮ
    -0.13
    że
    -0.13
    è¡Ĩ
    -0.13
    isel
    -0.13
    .Names
    -0.13
    POSITIVE LOGITS
     Inner
    0.32
    Inner
    0.28
    .Inner
    0.25
    inner
    0.24
     inner
    0.23
    -inner
    0.23
     Outer
    0.21
     INNER
    0.20
    .inner
    0.20
    (inner
    0.20
    Act Density 0.000%

    No Known Activations