INDEX
    Explanations

    table-specific tags in the text

    New Auto-Interp
    Negative Logits
     himſelf
    -0.59
    windowFixed
    -0.57
    ſelf
    -0.56
     Majefty
    -0.54
     createSlice
    -0.52
     pleaſure
    -0.51
     désert
    -0.50
     itſelf
    -0.49
     contacter
    -0.48
     myſelf
    -0.48
    POSITIVE LOGITS
    td
    2.59
     td
    1.80
    TD
    1.70
     TD
    1.52
    Td
    1.16
    tds
    1.12
     TDs
    0.90
    tdc
    0.78
    OGND
    0.72
    mtd
    0.67
    Act Density 0.001%

    No Known Activations