INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.41
    1.14
    را
    0.95
     Bridge
    0.93
    ↵↵
    0.90
    اج
    0.88
     is
    0.87
    h
    0.86
    tanggal
    0.84
    いを
    0.84
    POSITIVE LOGITS
    0
    1.54
    1.27
    guide
    1.20
    inę
    1.18
    1.16
    in
    1.15
     ০৯
    1.14
    1.09
    𝟬
    1.08
    𝟎
    1.05
    Act Density 0.013%

    No Known Activations