INDEX
    Explanations

    geopolitical terms and entities, especially related to countries and political figures

    references to North Korea and its allies

    New Auto-Interp
    Negative Logits
    %.
    -0.78
    .",
    -0.76
    "!
    -0.75
    .:
    -0.75
    .<
    -0.74
    !".
    -0.73
    .(
    -0.71
    ."
    -0.71
    ."[
    -0.69
    ".
    -0.69
    POSITIVE LOGITS
    *)
    1.10
     )]
    1.06
     ?)
    0.98
    )}
    0.95
    )]
    0.92
    ?)
    0.91
    )\
    0.87
    })
    0.86
    )|
    0.83
    -)
    0.83
    Act Density 2.100%

    No Known Activations