INDEX
    Explanations

    explanatory statements

    phrases related to explaining concepts or phenomena

    New Auto-Interp
    Negative Logits
    ngth
    -0.82
    ille
    -0.75
    emies
    -0.75
    kus
    -0.72
    sembly
    -0.70
    jab
    -0.67
    ontent
    -0.66
     Instruments
    -0.66
    ctors
    -0.65
    opers
    -0.63
    POSITIVE LOGITS
     why
    1.65
    why
    1.33
     WHY
    1.31
     how
    0.96
     discrepancies
    0.92
     Why
    0.88
     inconsistencies
    0.87
    Why
    0.84
     explanations
    0.81
     disapp
    0.80
    Act Density 0.052%

    No Known Activations