INDEX
    Explanations

    phrases related to beliefs, assumptions, and theories

    words related to assumptions and beliefs

    New Auto-Interp
    Negative Logits
    Interstitial
    -0.83
    sung
    -0.75
    waters
    -0.71
    odes
    -0.70
    thumbnails
    -0.69
    ateurs
    -0.67
    oho
    -0.65
    avid
    -0.65
    umen
    -0.63
    HCR
    -0.63
    POSITIVE LOGITS
     assumptions
    1.00
     assumption
    0.92
     underpin
    0.80
    staking
    0.75
     assumes
    0.73
    eers
    0.68
     presupp
    0.67
     biases
    0.66
     premise
    0.66
    arily
    0.65
    Act Density 0.015%

    No Known Activations