INDEX
    Explanations

    references to physical spaces, particularly indoors and outdoors

    New Auto-Interp
    Negative Logits
    thing
    -0.17
    eros
    -0.17
    ries
    -0.17
    shit
    -0.17
    mgr
    -0.17
    erus
    -0.17
    ataires
    -0.16
    ws
    -0.15
    eness
    -0.15
    maker
    -0.15
    POSITIVE LOGITS
    /out
    0.40
    -out
    0.30
    -Out
    0.24
    halb
    0.24
     ÙĪØ®
    0.23
    OUT
    0.22
    Out
    0.20
    /up
    0.20
    out
    0.19
     joke
    0.19
    Act Density 0.034%

    No Known Activations