INDEX
    Explanations

    instances of conditional statements describing potential actions

    phrases emphasizing ability or potential actions

    New Auto-Interp
    Negative Logits
     Uri
    -0.62
     UR
    -0.60
     IB
    -0.59
     Trin
    -0.58
    arch
    -0.58
    soever
    -0.57
     caution
    -0.57
    path
    -0.56
     contention
    -0.55
     Sod
    -0.55
    POSITIVE LOGITS
    't
    1.06
     afford
    0.94
     muster
    0.85
     convince
    0.82
     help
    0.78
    adian
    0.77
     somehow
    0.75
     survive
    0.74
    reach
    0.74
    utils
    0.73
    Act Density 0.084%

    No Known Activations