INDEX
    Explanations

    phrases related to making statements or declarations

    references to significant events or statements made by public figures

    New Auto-Interp
    Negative Logits
    helm
    -0.67
    blogspot
    -0.64
    ufact
    -0.61
    DEN
    -0.61
    MN
    -0.61
    ATED
    -0.60
    WER
    -0.60
    ãĥ¡
    -0.59
     distingu
    -0.58
     destro
    -0.58
    POSITIVE LOGITS
     bluff
    1.04
     hotline
    0.85
     Cth
    0.70
     kettle
    0.67
     ugly
    0.64
     Behavior
    0.64
    SourceFile
    0.62
     derogatory
    0.60
     NCT
    0.60
     Deity
    0.58
    Act Density 0.133%

    No Known Activations