INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     (
    0.98
    il
    0.91
    ir
    0.81
    ar
    0.74
    ene
    0.72
    c
    0.71
    gere
    0.68
    ts
    0.66
    ak
    0.65
    io
    0.64
    POSITIVE LOGITS
    0.80
    ي
    0.79
    문에
    0.78
    Г
    0.77
    0.75
    ر
    0.75
     migraines
    0.74
    р
    0.74
    0.74
    е
    0.72
    Act Density 0.033%

    No Known Activations