INDEX
    Explanations

    acronyms and abbreviations

    New Auto-Interp
    Negative Logits
    0.49
    ucapkan
    0.47
     Γ
    0.46
    রূপ
    0.45
     Β
    0.44
    HNO
    0.43
    0.43
    ства
    0.43
    ξύ
    0.42
     Süden
    0.41
    POSITIVE LOGITS
    و
    1.05
    на
    0.88
    ו
    0.88
    2
    0.80
     on
    0.79
    1
    0.72
    0.71
    ف
    0.71
    ка
    0.66
    ان
    0.64
    Act Density 0.570%

    No Known Activations