INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    WORK
    0.80
     cardboard
    0.75
     playwright
    0.75
    ların
    0.73
     fisherman
    0.73
    0.73
    ."
    0.71
    འི་
    0.71
     roundtable
    0.70
     flashlight
    0.70
    POSITIVE LOGITS
    ق
    1.49
    د
    1.23
    ك
    1.13
    هم
    1.13
    ب
    1.08
    ز
    1.06
    كين
    1.05
    1.03
    กิน
    1.01
    0.98
    Act Density 0.007%

    No Known Activations