INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     weapons
    -0.07
    Techn
    -0.07
    5
    -0.06
     version
    -0.06
    877
    -0.06
    ,j
    -0.06
     dj
    -0.06
     forbidden
    -0.06
    js
    -0.06
     redd
    -0.06
    POSITIVE LOGITS
    ate
    0.20
    ATE
    0.17
    ates
    0.13
    ات
    0.13
    olate
    0.13
    ате
    0.12
     Nate
    0.11
    late
    0.11
    cate
    0.11
    ATES
    0.11
    Act Density 0.036%

    No Known Activations