INDEX
    Explanations

    instances of the term "false" in various contexts related to misconceptions or misinformation

    New Auto-Interp
    Negative Logits
    adiens
    -0.17
    shal
    -0.16
    amar
    -0.16
     ActionTypes
    -0.16
    lp
    -0.15
    OfClass
    -0.15
    ram
    -0.15
    shire
    -0.15
    zd
    -0.15
    istles
    -0.14
    POSITIVE LOGITS
    hood
    0.27
     positives
    0.23
    -flag
    0.21
     alarms
    0.21
    -positive
    0.20
     alarm
    0.19
    fully
    0.19
    /false
    0.18
    claim
    0.17
    ivec
    0.17
    Act Density 0.032%

    No Known Activations