INDEX
    Explanations

    words related to legal or political issues

    New Auto-Interp
    Negative Logits
     Niet
    -0.72
     Seym
    -0.72
     Moroc
    -0.70
     Instr
    -0.67
     Rica
    -0.66
     Fas
    -0.62
    ãĥ¼ãĥĨ
    -0.62
     Berm
    -0.61
     advoc
    -0.61
     shenan
    -0.58
    POSITIVE LOGITS
     )))
    0.71
     );
    0.67
     };
    0.65
     ][
    0.64
     ));
    0.61
    "?
    0.60
     ());
    0.59
     ))
    0.59
     ·
    0.59
     });
    0.59
    Act Density 0.278%

    No Known Activations