INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     avant
    -0.07
    んです
    -0.07
     ط
    -0.07
    REAT
    -0.07
    いますが
    -0.06
    -0.06
     disgr
    -0.06
     eben
    -0.06
    Thinking
    -0.06
    -0.06
    POSITIVE LOGITS
    _COPY
    0.07
    kw
    0.07
    	body
    0.07
    -one
    0.07
    actly
    0.07
    	board
    0.06
    0.06
    -w
    0.06
    _children
    0.06
    illum
    0.06
    Act Density 0.061%

    No Known Activations