INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    avig
    -0.07
     سالم
    -0.07
    .listeners
    -0.06
    .But
    -0.06
     strate
    -0.06
     Ni
    -0.06
     стат
    -0.06
     neutral
    -0.06
    beit
    -0.06
     Km
    -0.06
    POSITIVE LOGITS
     enclosed
    0.13
     enclosure
    0.12
     enclosing
    0.11
     imprison
    0.07
    losure
    0.07
     Dixon
    0.07
    ِل
    0.07
     secure
    0.07
     üst
    0.06
     ẩn
    0.06
    Act Density 0.003%

    No Known Activations