INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.64
    0.63
    0.59
    "^
    0.58
    ması
    0.57
    0.57
     cityName
    0.57
     فونبیټ
    0.54
     جوړونک
    0.53
     zatim
    0.53
    POSITIVE LOGITS
    ↵↵
    0.91
    ب
    0.83
    a
    0.77
    ↵↵↵
    0.73
    ing
    0.71
    us
    0.68
    ev
    0.67
    o
    0.65
    se
    0.64
    0.63
    Act Density 0.822%

    No Known Activations