INDEX
    Explanations

    math word problems

    New Auto-Interp
    Negative Logits
     сель
    -0.08
     нал
    -0.08
    esas
    -0.08
     avoidance
    -0.07
    &amp
    -0.07
    ac
    -0.07
     رم
    -0.07
    etas
    -0.07
    avorites
    -0.07
    "];
    ↵
    -0.07
    POSITIVE LOGITS
    0.09
     الأصل
    0.09
     wanda
    0.09
     უკან
    0.08
     reconstruct
    0.08
    0.08
     Herc
    0.08
     før
    0.08
     ursprüng
    0.08
    Underlying
    0.08
    Act Density 0.116%

    No Known Activations