INDEX
    Explanations

    proper nouns relating to famous people or specific scenarios

    New Auto-Interp
    Negative Logits
     pim
    -0.60
    REDACTED
    -0.59
     notch
    -0.55
     Metatron
    -0.54
     reconc
    -0.54
    ACTED
    -0.53
     tumble
    -0.53
    Spoiler
    -0.52
     clown
    -0.52
     WB
    -0.52
    POSITIVE LOGITS
    enei
    0.80
    ilian
    0.78
    chel
    0.77
    eli
    0.77
    ili
    0.76
    abyte
    0.76
    eto
    0.76
    esh
    0.75
    emon
    0.74
    iber
    0.74
    Act Density 0.908%

    No Known Activations