INDEX
    Explanations

    logical reasoning

    New Auto-Interp
    Negative Logits
     asap
    -0.08
     bait
    -0.08
     ček
    -0.07
     rim
    -0.07
     Robert
    -0.07
     april
    -0.07
    stelle
    -0.07
     rt
    -0.07
    _resources
    -0.07
    agen
    -0.07
    POSITIVE LOGITS
     orientations
    0.10
     unterschiedliche
    0.10
     المختلفة
    0.10
     hues
    0.09
     transformations
    0.09
     orientation
    0.09
     unterschiedlichen
    0.09
     differing
    0.09
     Vielfalt
    0.09
     togg
    0.08
    Act Density 0.035%

    No Known Activations