INDEX
    Explanations

    references to personal anecdotes and experiences

    New Auto-Interp
    Negative Logits
     guiName
    -0.70
    tesy
    -0.68
    ãĥ«
    -0.68
    manac
    -0.67
    ":"/
    -0.63
    obook
    -0.63
    itled
    -0.61
    afe
    -0.61
    theless
    -0.60
    ublished
    -0.60
    POSITIVE LOGITS
     etc
    2.00
    etc
    1.66
    whatever
    1.21
     blah
    1.20
     ect
    1.11
    ...)
    1.06
    â̦)
    1.05
     whatever
    1.01
     et
    0.85
     yes
    0.80
    Act Density 0.185%

    No Known Activations