INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2.73
    ت
    2.60
     demean
    2.38
     combate
    2.35
     mism
    2.35
    تس
    2.26
     ironic
    2.24
     Confeder
    2.24
     fara
    2.19
    чително
    2.17
    POSITIVE LOGITS
    ویں
    3.08
    ي
    2.81
    ある
    2.80
    و
    2.76
    𝑐
    2.73
    σ
    2.67
    きの
    2.55
    𝑖
    2.55
    на
    2.48
    2.48
    Act Density 0.009%

    No Known Activations