INDEX
    Explanations

    references to specific individuals with the letter 'H'

    New Auto-Interp
    Negative Logits
     McGregor
    -0.16
    imit
    -0.15
    asers
    -0.14
     Arch
    -0.14
    837
    -0.14
     Crane
    -0.14
    ARCH
    -0.14
    action
    -0.14
    UY
    -0.14
     general
    -0.14
    POSITIVE LOGITS
    iddle
    0.29
    anks
    0.28
    IDDLE
    0.25
    iddles
    0.19
    ANK
    0.17
    olls
    0.17
    linger
    0.16
    atos
    0.16
    iddleware
    0.15
    elsen
    0.15
    Act Density 0.007%

    No Known Activations