INDEX
    Explanations

    phrases containing the word "Tw" followed by a number and possibly other characters

    the presence of specific tokens or symbols related to a particular format or category

    New Auto-Interp
    Negative Logits
    ãĤ¹ãĥĪ
    -0.78
    ãĥ£
    -0.74
    senal
    -0.73
    ++++++++++++++++
    -0.71
     PRESS
    -0.64
    ³³³³³³³³³³³³³³³³
    -0.63
     restricting
    -0.61
    tenance
    -0.59
     fracturing
    -0.58
     territorial
    -0.58
    POSITIVE LOGITS
    elfth
    1.36
    enty
    1.25
    elve
    1.23
    olves
    1.17
    ilight
    1.14
    inkle
    1.13
    erker
    1.13
    orld
    1.12
    erk
    1.12
    orks
    1.11
    Act Density 0.024%

    No Known Activations