INDEX
    Explanations

    references to presentation slides

    New Auto-Interp
    Negative Logits
    §
    -2.90
    ¦
    -2.77
    ©
    -2.68
    Į
    -2.61
    ķ
    -2.60
    ĵ
    -2.56
    ij
    -2.49
    °
    -2.48
    ¥
    -2.47
    Ħ
    -2.38
    POSITIVE LOGITS
    heet
    2.27
    mith
    2.22
    heets
    2.17
    ource
    2.15
    chool
    2.00
    ystem
    1.94
    pot
    1.92
    core
    1.91
    urface
    1.88
    ior
    1.85
    Act Density 0.009%

    No Known Activations