INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ни
    1.22
    1
    1.17
    0.98
     grammar
    0.98
     water
    0.94
     reproductive
    0.94
     tulip
    0.93
     turquoise
    0.92
     racquet
    0.90
    -
    0.90
    POSITIVE LOGITS
    in
    1.74
    لية
    1.45
    al
    1.39
    ور
    1.34
    ن
    1.32
    ر
    1.31
    1.30
    ل
    1.24
    к
    1.21
    л
    1.20
    Act Density 0.003%

    No Known Activations