INDEX
    Explanations

    terms associated with falsehoods and inaccuracies

    falsehoods and deceptions

    New Auto-Interp
    Negative Logits
     Consideration
    -0.44
    Aga
    -0.43
    Kita
    -0.40
    iVar
    -0.40
     Kita
    -0.40
    ườn
    -0.39
     ngang
    -0.39
     Kidd
    -0.39
    Hug
    -0.39
    ITA
    -0.39
    POSITIVE LOGITS
     false
    1.40
     False
    1.30
    False
    1.25
    false
    1.22
     fausse
    1.17
     falsa
    1.13
     falsos
    1.12
     falso
    1.09
     falsas
    1.08
     FALSE
    1.07
    Act Density 0.025%

    No Known Activations