INDEX
    Explanations

    phrases associated with deception or inaccuracies in claims

    New Auto-Interp
    Negative Logits
    aires
    -0.15
     McGr
    -0.15
    IDS
    -0.15
    ctors
    -0.14
    cctor
    -0.14
     Dog
    -0.14
     Westbrook
    -0.13
    flix
    -0.13
    gende
    -0.13
    Cou
    -0.13
    POSITIVE LOGITS
    hte
    0.16
    hatt
    0.16
     Pey
    0.15
    assa
    0.15
    stal
    0.14
    ล
    0.14
    avad
    0.14
    URN
    0.14
    urn
    0.14
     crash
    0.13
    Act Density 0.168%

    No Known Activations