INDEX
    Explanations

    phrases related to conspiracy theories and extremist ideologies

    New Auto-Interp
    Negative Logits
     shouldBe
    -0.62
     įsi
    -0.59
     rayures
    -0.57
     apsau
    -0.54
     gardien
    -0.52
     attaques
    -0.51
     nė
    -0.51
     spreken
    -0.51
     naudoti
    -0.51
     virš
    -0.51
    POSITIVE LOGITS
     magis
    0.84
     tamen
    0.77
     palab
    0.75
    Läh
    0.74
    Mitä
    0.74
     mef
    0.73
     ibi
    0.71
     reputa
    0.70
    Sklici
    0.70
     priva
    0.69
    Act Density 0.345%

    No Known Activations