INDEX
    Explanations

    multiple languages

    New Auto-Interp
    Negative Logits
     he
    -0.07
     подум
    -0.06
     You
    -0.06
     she
    -0.06
    	parameters
    -0.06
     lesb
    -0.06
     fries
    -0.06
     who
    -0.06
     lj
    -0.06
     that
    -0.06
    POSITIVE LOGITS
    nou
    0.07
    ря
    0.07
    τι
    0.06
    人気
    0.06
    -content
    0.06
    ουν
    0.06
     παι
    0.06
    umbing
    0.06
    ighton
    0.06
    имость
    0.06
    Act Density 0.280%

    No Known Activations