INDEX
    Explanations

    historical references and terminology related to political movements or ideologies

    New Auto-Interp
    Negative Logits
    190
    -0.28
    189
    -0.24
    191
    -0.23
    187
    -0.23
    188
    -0.23
    ctors
    -0.20
    192
    -0.19
     telegram
    -0.19
    186
    -0.19
     Alfred
    -0.17
    POSITIVE LOGITS
    176
    0.41
    174
    0.36
    178
    0.35
    177
    0.35
    173
    0.35
    175
    0.33
    172
    0.31
    179
    0.31
     Enlightenment
    0.30
    171
    0.27
    Act Density 0.152%

    No Known Activations