INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cooler
    -0.07
    160
    -0.07
     الذين
    -0.07
    [field
    -0.07
     ков
    -0.06
    قدر
    -0.06
     book
    -0.06
     μαζί
    -0.06
     Book
    -0.06
    -0.06
    POSITIVE LOGITS
     syntax
    0.15
    Syntax
    0.13
     Syntax
    0.13
    syntax
    0.11
    -syntax
    0.10
     synt
    0.08
    (Syntax
    0.08
    NTAX
    0.08
     parity
    0.08
    .syntax
    0.08
    Act Density 0.005%

    No Known Activations