INDEX
    Explanations

    statements related to power dynamics, discrimination, and conspiracy theories

    New Auto-Interp
    Negative Logits
    <bos>
    -1.53
    rungsseite
    -0.60
     autorytatywna
    -0.54
     Normdatei
    -0.54
    انجليز
    -0.52
    Debido
    -0.52
     HFILL
    -0.52
    IVEREF
    -0.51
     disambiguazione
    -0.51
    webElementXpaths
    -0.49
    POSITIVE LOGITS
     Abbé
    0.89
     ordina
    0.88
     Ordre
    0.85
     carrefour
    0.83
     ecclesias
    0.81
     ivi
    0.81
     Ottobre
    0.80
     Aéroport
    0.79
     Confe
    0.79
     Bibl
    0.78
    Act Density 0.844%

    No Known Activations