INDEX
    Explanations

    numbers in version strings

    New Auto-Interp
    Negative Logits
    ش
    0.87
    1
    0.80
    س
    0.77
    2
    0.72
     виды
    0.68
    ج
    0.66
    рі
    0.64
    0.64
    ي
    0.63
    i
    0.62
    POSITIVE LOGITS
    A
    0.73
    has
    0.64
     has
    0.60
     U
    0.60
     E
    0.58
     K
    0.58
     B
    0.57
     S
    0.57
     A
    0.55
     
    0.55
    Act Density 0.004%

    No Known Activations