INDEX
    Explanations

    negations and expressions of uncertainty or lack of understanding

    New Auto-Interp
    Negative Logits
    éį
    -0.16
    ivor
    -0.16
    unto
    -0.14
    XH
    -0.14
    áli
    -0.14
    phones
    -0.14
    wares
    -0.14
    fant
    -0.14
    ائع
    -0.13
     facto
    -0.13
    POSITIVE LOGITS
     choice
    0.29
     interest
    0.21
    choice
    0.21
     Choice
    0.21
    Choice
    0.19
     patience
    0.18
     Interest
    0.18
     intention
    0.18
     doubt
    0.18
    _interest
    0.18
    Act Density 0.045%

    No Known Activations