INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     besie
    -0.07
     instance
    -0.07
     nieuwe
    -0.07
     Spare
    -0.06
     metaphor
    -0.06
     köy
    -0.06
     Vương
    -0.06
    یده
    -0.06
    -shaped
    -0.06
     ép
    -0.06
    POSITIVE LOGITS
     maximal
    0.07
    identification
    0.07
    のみ
    0.07
    dff
    0.07
    Originally
    0.07
     مواد
    0.07
    0.06
     $↵
    0.06
    ẫn
    0.06
    dsa
    0.06
    Act Density 0.023%

    No Known Activations