INDEX
    Explanations

    numerical structures such as lists or countdowns

    the repeated appearance of a specific character or symbol

    New Auto-Interp
    Negative Logits
     adolesc
    -0.66
     suicide
    -0.65
     ignition
    -0.64
     Morse
    -0.63
     Spot
    -0.63
     Stuff
    -0.62
     Lancaster
    -0.62
    undown
    -0.61
     Antar
    -0.61
     Addiction
    -0.60
    POSITIVE LOGITS
    agree
    1.02
    own
    0.98
    ï¸ı
    0.97
    should
    0.89
    felt
    0.87
    mand
    0.87
    tarians
    0.84
    tu
    0.84
     selves
    0.83
    ould
    0.83
    Act Density 0.161%

    No Known Activations