INDEX
    Explanations

    references to interviews and discussions with various individuals

    New Auto-Interp
    Negative Logits
    ioc
    -0.17
    iginal
    -0.16
    ade
    -0.16
    heim
    -0.15
    ilded
    -0.15
    osal
    -0.15
    owing
    -0.15
    izons
    -0.15
    ities
    -0.14
    akis
    -0.14
    POSITIVE LOGITS
    ees
    0.20
    ee
    0.17
    ys
    0.16
    ashington
    0.16
    ulse
    0.15
    rech
    0.15
    392
    0.14
    ml
    0.14
    lsa
    0.14
    ées
    0.14
    Act Density 0.022%

    No Known Activations