INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     adap
    -0.08
     tato
    -0.07
    耀
    -0.07
     vrij
    -0.07
    -0.07
    Το
    -0.07
    τ
    -0.07
    128
    -0.06
    129
    -0.06
     illust
    -0.06
    POSITIVE LOGITS
     concern
    0.10
     concerning
    0.09
     concerns
    0.09
     concerned
    0.09
    _SUR
    0.08
    0.07
     Gu
    0.07
    なんだ
    0.07
    crc
    0.07
    Command
    0.06
    Act Density 0.016%

    No Known Activations