INDEX
    Explanations

    phrases suggesting the importance of not solely relying on the speaker's claims

    New Auto-Interp
    Negative Logits
    999
    -0.06
     no
    -0.06
    fol
    -0.06
     front
    -0.05
     Stam
    -0.05
    erer
    -0.05
    bard
    -0.05
     false
    -0.05
    ú
    -0.05
    rong
    -0.05
    POSITIVE LOGITS
    alone
    0.09
     trust
    0.09
     alone
    0.09
    Trust
    0.08
    ÙħاÙĨÛĮ
    0.08
     банкÑĥ
    0.08
     Alone
    0.07
    trust
    0.07
     Trust
    0.07
    пов
    0.07
    Act Density 0.004%

    No Known Activations