INDEX
    Explanations

    numerical codes or identifiers

    New Auto-Interp
    Negative Logits
    UTH
    -0.18
     IR
    -0.17
    ITY
    -0.17
    USR
    -0.16
    ULT
    -0.16
     UP
    -0.16
     ITS
    -0.15
    ULO
    -0.15
    THR
    -0.15
    [][]
    -0.15
    POSITIVE LOGITS
    CB
    0.22
    AB
    0.22
    DE
    0.22
    BE
    0.22
    EB
    0.21
    FB
    0.21
    B
    0.21
    FE
    0.20
    DB
    0.20
     DE
    0.20
    Act Density 0.013%

    No Known Activations