INDEX
    Explanations

    mentions of governmental figures

    New Auto-Interp
    Negative Logits
    theless
    -0.80
     Primordial
    -0.77
     sensit
    -0.70
    schild
    -0.66
     Shattered
    -0.62
     IMAGES
    -0.61
    MQ
    -0.59
    anwhile
    -0.58
     Leilan
    -0.58
     Warfare
    -0.57
    POSITIVE LOGITS
    .,
    0.89
    inda
    0.88
    oration
    0.84
    orship
    0.82
    inker
    0.82
    iture
    0.81
    omo
    0.81
    utable
    0.78
    istries
    0.76
    inelli
    0.76
    Act Density 0.016%

    No Known Activations