INDEX
    Explanations

    proper nouns, particularly names and titles

    New Auto-Interp
    Negative Logits
    ↵↵
    -0.43
     Rasmussen
    -0.40
    XCTAssert
    -0.40
    -
    -0.40
     Figue
    -0.40
    <eos>
    -0.39
    ,
    -0.38
     Gebir
    -0.38
     Cardoso
    -0.38
     Karlsson
    -0.37
    POSITIVE LOGITS
    :✨
    0.92
     typed
    0.84
    typed
    0.73
    parsedMessage
    0.72
     Typed
    0.71
     Typing
    0.71
     Losses
    0.69
     ſind
    0.69
     typing
    0.69
    <unused43>
    0.68
    Act Density 0.235%

    No Known Activations