INDEX
    Explanations

    surface and layers

    New Auto-Interp
    Negative Logits
    -0.08
    里面有
    -0.08
     Card
    -0.07
    CommandLine
    -0.07
     scre
    -0.07
    -0.07
    -0.07
    çois
    -0.07
    -0.07
    -0.07
    POSITIVE LOGITS
     hòa
    0.08
    0.08
    oug
    0.07
     danger
    0.07
     mover
    0.07
    高压
    0.07
    magic
    0.06
     #####
    0.06
     smoker
    0.06
    0.06
    Act Density 0.037%

    No Known Activations