INDEX
    Explanations

    phrases that express falsehood or deception

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.08
    3:0.07
    4:0.07
    5:0.08
    6:0.08
    7:0.08
    8:0.08
    9:0.08
    10:0.08
    11:0.08
    Negative Logits
    estate
    -3.58
    essen
    -3.54
    Hispanic
    -2.94
    Southern
    -2.93
    gas
    -2.89
    aum
    -2.88
    atown
    -2.86
    roe
    -2.77
    overty
    -2.72
    tin
    -2.70
    POSITIVE LOGITS
     Typhoon
    2.81
     Typh
    2.60
     Shogun
    2.57
     Bus
    2.47
     Jinping
    2.45
     doubtless
    2.41
     queues
    2.39
     Constable
    2.38
     Farage
    2.36
     Huawei
    2.36
    Act Density 0.000%

    No Known Activations