INDEX
    Explanations

    references to mistreatment or abuse in relation to visa status

    New Auto-Interp
    Negative Logits
     ('
    -0.32
    -0.32
     ("
    -0.29
     (~
    -0.27
     (
    -0.26
     («
    -0.25
    -0.23
     '
    -0.23
    â̦"
    -0.23
     (&
    -0.22
    POSITIVE LOGITS
    --↵
    0.31
    ----↵
    0.30
    ....↵
    0.27
    -----↵
    0.24
    ....
    0.23
    ......
    0.23
    ....↵↵
    0.22
    .....↵↵
    0.22
    ---↵
    0.21
    —↵↵
    0.20
    Act Density 0.005%

    No Known Activations