INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2
    1.60
    5
    1.55
    1
    1.52
    7
    1.52
    3
    1.50
    4
    1.44
    9
    1.39
    8
    1.34
    LEGO
    1.20
    0
    1.19
    POSITIVE LOGITS
    ا
    1.27
    ción
    1.02
    1.02
     وأن
    0.99
    י
    0.98
    ciation
    0.97
    اً
    0.96
    ز
    0.96
    دارة
    0.95
    ולנד
    0.95
    Act Density 0.021%

    No Known Activations