INDEX
    Explanations

    references to general concepts or items that are significant or noteworthy

    New Auto-Interp
    Negative Logits
    AnchorStyles
    -0.96
     pleaſure
    -0.89
     myſelf
    -0.86
     juſ
    -0.85
     ſtate
    -0.83
     Jefus
    -0.83
     ſtre
    -0.83
     uſe
    -0.82
     viſ
    -0.80
     ſta
    -0.80
    POSITIVE LOGITS
     thing
    1.37
     things
    1.36
     Thing
    1.34
     THING
    1.30
    Things
    1.28
     Things
    1.25
     THINGS
    1.24
    Thing
    1.21
    things
    1.10
    THING
    0.96
    Act Density 0.080%

    No Known Activations