INDEX
    Explanations

    code identifiers and assignments

    New Auto-Interp
    Negative Logits
     this
    -1.88
     which
    -1.67
     what
    -1.63
     huge
    -1.57
     supremely
    -1.55
     drastically
    -1.55
     oddly
    -1.53
     horribly
    -1.53
     both
    -1.51
     sophisticated
    -1.51
    POSITIVE LOGITS
    particularly
    1.45
     همچنین
    1.34
    もある
    1.32
     malgré
    1.30
    があった
    1.27
     acclaimed
    1.26
    despite
    1.26
     praktisch
    1.26
    などは
    1.26
    makes
    1.25
    Act Density 0.029%

    No Known Activations