INDEX
    Explanations

    numeric representations, particularly dates and identifiers

    New Auto-Interp
    Negative Logits
     itſelf
    -1.41
     myſelf
    -1.38
     iſt
    -1.26
     purpoſe
    -1.24
     ſever
    -1.21
    ArrowToggle
    -1.19
     greateſt
    -1.14
     Reſ
    -1.14
     ―――――
    -1.14
    featureID
    -1.13
    POSITIVE LOGITS
    0.90
    0.66
     I
    0.65
    </b>
    0.65
    </h2>
    0.62
      
    0.62
    1
    0.60
    ↵↵
    0.59
     (
    0.59
     P
    0.59
    Act Density 0.437%

    No Known Activations