INDEX
    Explanations

    acronyms starting with "TH" followed by a number

    instances of the word "TH" and variations of "Thing."

    New Auto-Interp
    Negative Logits
     Libre
    -0.77
    iste
    -0.68
    angelo
    -0.66
    alia
    -0.65
    Cam
    -0.64
    inka
    -0.62
     shepherd
    -0.62
    nell
    -0.62
    ello
    -0.62
    Mil
    -0.62
    POSITIVE LOGITS
     TH
    3.69
    TH
    1.85
    Th
    1.54
     Th
    1.53
     THR
    1.44
     WH
    1.35
     KN
    1.28
     Than
    1.28
     TW
    1.27
     STEP
    1.27
    Act Density 0.014%

    No Known Activations