INDEX
    Explanations

    terms related to political situations and interventions

    New Auto-Interp
    Negative Logits
    </code>
    -1.30
    </u>
    -0.88
     "
    -0.78
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    -0.78
    ↵↵↵↵
    -0.78
    -0.73
    ↵↵↵
    -0.70
    ↵↵↵↵↵↵↵↵↵↵
    -0.69
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    -0.67
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    -0.66
    POSITIVE LOGITS
    »
    2.34
    »,
    2.25
    ».
    2.23
    2.19
    2.17
    2.15
    2.10
    »:
    2.04
    2.02
    )».
    2.01
    Act Density 0.159%

    No Known Activations