INDEX
    Explanations

    phrases indicating a request for trustworthy information or feedback

    questions regarding trustworthiness and subscriptions to news content

    New Auto-Interp
    Negative Logits
    beit
    -0.77
    naire
    -0.67
     misunder
    -0.63
    alist
    -0.61
    esome
    -0.60
    hof
    -0.60
    liest
    -0.59
    eni
    -0.58
     Calais
    -0.57
     helicop
    -0.57
    POSITIVE LOGITS
     Email
    0.91
    utm
    0.90
     Subscribe
    0.83
     Attend
    0.78
     Become
    0.74
     Replay
    0.74
    Content
    0.71
     Want
    0.71
     Visit
    0.70
     Try
    0.69
    Act Density 0.021%

    No Known Activations