INDEX
    Explanations

    code/technical discussions

    New Auto-Interp
    Negative Logits
     شكل
    -0.07
     shame
    -0.06
    -0.06
     Наг
    -0.06
     smoking
    -0.06
     nghe
    -0.06
    cup
    -0.06
    esiyle
    -0.06
    -0.06
    umbnail
    -0.06
    POSITIVE LOGITS
     [=[
    0.07
    0.07
     USB
    0.06
     klas
    0.06
    -enh
    0.06
     historia
    0.06
     خود
    0.06
     Đại
    0.06
    ечение
    0.06
     자신
    0.06
    Act Density 0.001%

    No Known Activations