INDEX
    Explanations

    explicit mentions of the word "false"

    references to falsehoods or misleading claims

    New Auto-Interp
    Negative Logits
    hens
    -1.00
     guiActiveUnfocused
    -0.97
    hetti
    -0.81
    asio
    -0.77
    ajo
    -0.76
    arya
    -0.75
    mun
    -0.75
    rike
    -0.73
    forces
    -0.72
    aldo
    -0.71
    POSITIVE LOGITS
     positives
    1.01
     accuser
    0.89
     guiActiveUn
    0.86
     dich
    0.85
     guiIcon
    0.79
     false
    0.76
     negatives
    0.76
     falsely
    0.76
     accusation
    0.74
     imprisonment
    0.74
    Act Density 0.019%

    No Known Activations