INDEX
    Explanations

    the word "all" appearing with high activation

    occurrences of the word "all."

    New Auto-Interp
    Negative Logits
    aminer
    -0.73
    potion
    -0.72
    yip
    -0.69
    IDS
    -0.64
    lav
    -0.64
    zn
    -0.64
    ker
    -0.63
    assembly
    -0.62
    arter
    -0.61
    oute
    -0.61
    POSITIVE LOGITS
    ocating
    1.33
     kinds
    1.17
    igators
    1.11
     sorts
    1.10
    igator
    1.04
    iances
    1.02
    owing
    1.02
    usions
    1.00
    ocated
    0.94
    ocate
    0.93
    Act Density 0.139%

    No Known Activations