INDEX
    Explanations

    phrases indicating assistance or contribution

    instances of the word "helped."

    New Auto-Interp
    Negative Logits
    Policy
    -0.64
    gran
    -0.61
    parts
    -0.61
     separation
    -0.60
    uns
    -0.60
    owl
    -0.60
    clusions
    -0.59
    War
    -0.59
     contrace
    -0.58
    itar
    -0.58
    POSITIVE LOGITS
     helped
    0.83
    ĸļ
    0.78
     propel
    0.74
     helping
    0.73
    waukee
    0.72
     Assist
    0.71
     usher
    0.71
    ridor
    0.70
     buoy
    0.68
    urated
    0.67
    Act Density 0.013%

    No Known Activations