INDEX
    Explanations

    phrases related to reasoning or justification

    the word "because," indicating causal relationships in statements

    New Auto-Interp
    Negative Logits
    agin
    -0.78
    yan
    -0.75
    nin
    -0.74
    mint
    -0.72
    wn
    -0.72
    Gas
    -0.69
    lem
    -0.67
    Luc
    -0.67
    thal
    -0.64
    ries
    -0.64
    POSITIVE LOGITS
    */(
    0.90
    uristic
    0.75
     proxies
    0.74
    endment
    0.72
     they
    0.71
    ority
    0.69
    ecause
    0.67
    akening
    0.67
    urers
    0.65
    uras
    0.64
    Act Density 0.079%

    No Known Activations