INDEX
    Explanations

    phrases related to causality and attribution

    New Auto-Interp
    Negative Logits
     Carbuncle
    -0.63
    ura
    -0.61
    ahs
    -0.59
    iverpool
    -0.59
    aws
    -0.57
    ourse
    -0.57
    esc
    -0.57
    talk
    -0.57
     Chains
    -0.57
     clipboard
    -0.56
    POSITIVE LOGITS
     partly
    1.00
     solely
    0.98
     chiefly
    0.90
     principally
    0.87
     primarily
    0.84
     partially
    0.82
     mainly
    0.81
     entirely
    0.80
     largely
    0.78
     squarely
    0.74
    Act Density 0.105%

    No Known Activations