INDEX
    Explanations

    mentions or references to political figures and events

    references to political events and figures

    New Auto-Interp
    Negative Logits
    Sund
    -0.58
    reek
    -0.53
     Hawks
    -0.53
    Els
    -0.51
    ãĥ³ãĤ¸
    -0.49
     tho
    -0.49
    Nar
    -0.49
     Samar
    -0.48
    pac
    -0.48
     oy
    -0.48
    POSITIVE LOGITS
    .''.
    1.00
    *.
    0.89
    ãĢĤ
    0.89
    .''
    0.80
    .
    0.80
    .).
    0.79
    .*
    0.78
    .(
    0.76
    .�
    0.75
     attRot
    0.73
    Act Density 1.260%

    No Known Activations