INDEX
    Explanations

    statements or claims that indicate truthfulness or validity

    New Auto-Interp
    Negative Logits
     يتيمه
    -0.95
    AnchorStyles
    -0.90
     Monfieur
    -0.90
     Bernadette
    -0.90
     avoient
    -0.84
     صوتيه
    -0.82
     ejus
    -0.81
     sélectionnés
    -0.76
     <=",
    -0.75
     étoient
    -0.75
    POSITIVE LOGITS
     True
    1.26
     true
    1.25
     TRUE
    1.18
     Tru
    1.08
    True
    1.06
    TRUE
    1.03
    Tru
    1.03
    stdbool
    1.03
    isTrue
    1.00
     False
    0.96
    Act Density 0.082%

    No Known Activations