INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     irresponsible
    -0.11
    Mismatch
    -0.10
    394
    -0.09
     Inspiration
    -0.09
     depreci
    -0.09
    ines
    -0.09
    _tid
    -0.09
     gent
    -0.09
    åħ´
    -0.09
     instinct
    -0.08
    POSITIVE LOGITS
     hub
    0.30
     Pride
    0.26
     pride
    0.26
    hub
    0.23
     Hub
    0.22
     ambition
    0.21
    Hub
    0.20
     eg
    0.19
     ego
    0.19
     cov
    0.19
    Act Density 0.103%

    No Known Activations