INDEX
    Explanations

    references to traditional concepts, practices, or items across various contexts

    New Auto-Interp
    Negative Logits
    arel
    -0.16
    ogl
    -0.15
    indr
    -0.14
    bras
    -0.13
    thing
    -0.13
    ings
    -0.13
    mented
    -0.13
    sburg
    -0.13
    .joda
    -0.13
    /he
    -0.13
    POSITIVE LOGITS
    ists
    0.36
    ist
    0.31
    ISTS
    0.24
    ism
    0.24
    itionally
    0.23
    /current
    0.22
    isti
    0.21
    ista
    0.21
    -looking
    0.21
    istic
    0.20
    Act Density 0.029%

    No Known Activations