INDEX
    Explanations

    phrases indicating universality or generalization

    New Auto-Interp
    Negative Logits
    onio
    -0.44
    ствия
    -0.42
    Storyboard
    -0.42
    主意
    -0.41
    mologie
    -0.41
    ahal
    -0.41
     low
    -0.41
     an
    -0.40
    |>
    -0.39
     Ref
    -0.39
    POSITIVE LOGITS
     every
    1.99
    every
    1.99
    Every
    1.94
     Every
    1.93
    Chaque
    1.82
     Chaque
    1.80
    EVERY
    1.74
    Each
    1.73
     Each
    1.72
     each
    1.72
    Act Density 0.408%

    No Known Activations