INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     correta
    -0.08
    (Property
    -0.07
    (lambda
    -0.07
    -0.07
    (make
    -0.07
    _layout
    -0.07
     René
    -0.07
    mall
    -0.07
     બાબ
    -0.07
    چې
    -0.07
    POSITIVE LOGITS
     invented
    0.08
     можем
    0.08
     بحاجة
    0.08
     gotta
    0.08
     Kennt
    0.08
     Cra
    0.07
     avons
    0.07
     want
    0.07
     him
    0.07
    aved
    0.07
    Act Density 0.040%

    No Known Activations