INDEX
    Explanations

    mentions of political affiliations and governmental changes

    New Auto-Interp
    Negative Logits
     rám
    -0.15
    arges
    -0.15
    arge
    -0.15
    \Modules
    -0.14
    oux
    -0.14
    ARGE
    -0.14
    onden
    -0.14
    ãĥ¼ãĥľ
    -0.13
    ofire
    -0.13
    ample
    -0.13
    POSITIVE LOGITS
     allegiance
    0.42
     loyalty
    0.40
     loyal
    0.36
    loy
    0.34
     Loy
    0.33
     alignment
    0.32
     align
    0.32
     switch
    0.30
     switching
    0.30
     alleg
    0.30
    Act Density 0.264%

    No Known Activations