INDEX
    Explanations

    table markdown formatting

    New Auto-Interp
    Negative Logits
     humanities
    0.90
     firm
    0.88
     palm
    0.86
     poisonous
    0.86
     toxicology
    0.85
     biological
    0.84
     minced
    0.84
     microcosm
    0.82
     sole
    0.82
     fucking
    0.82
    POSITIVE LOGITS
    |
    2.40
     |
    2.19
    |\
    1.72
    ||
    1.71
    |=
    1.60
    |"
    1.59
    |(
    1.57
    |.
    1.56
    |,
    1.52
    |$
    1.51
    Act Density 0.211%

    No Known Activations