INDEX
    Explanations

    repetitive words or actions that indicate a sense of universality or inclusivity

    New Auto-Interp
    Negative Logits
    adol
    -0.19
    2
    -0.18
    reck
    -0.17
    1
    -0.16
    99
    -0.15
    101
    -0.15
     Abbas
    -0.15
    elf
    -0.15
    of
    -0.15
    A
    -0.14
    POSITIVE LOGITS
    .scalablytyped
    0.17
    Lint
    0.17
    "title
    0.16
    maal
    0.15
    geme
    0.15
    ÃĹ↵↵
    0.15
    ieber
    0.15
    tro
    0.15
    ORB
    0.15
    ãĥªãĥ¼ãĤº
    0.15
    Act Density 0.042%

    No Known Activations