INDEX
    Explanations

    instances of the word "slice" followed by a number indicating the strength of the activation

    references to "slices" in various contexts, often metaphorically describing parts of a larger whole

    New Auto-Interp
    Negative Logits
    founded
    -0.73
    administ
    -0.66
    development
    -0.65
    supp
    -0.64
    Found
    -0.64
    answered
    -0.63
    Design
    -0.62
    DCS
    -0.61
    lied
    -0.60
    jamin
    -0.60
    POSITIVE LOGITS
     slices
    1.34
     slice
    1.34
    slice
    1.00
     sliced
    0.88
    mble
    0.87
     slicing
    0.83
    iewicz
    0.82
    azo
    0.80
    ©¶æ¥µ
    0.79
    cery
    0.77
    Act Density 0.007%

    No Known Activations