INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    intios
    -0.68
    المكان
    -0.64
     fevere
    -0.62
     &___
    -0.61
     propOrder
    -0.60
    ckså
    -0.60
     fubject
    -0.60
    Gesch
    -0.60
     thorny
    -0.60
    اقتصاد
    -0.59
    POSITIVE LOGITS
    matcher
    0.47
    woman
    0.45
    mens
    0.41
    rator
    0.41
    board
    0.40
    <()>
    0.39
    allus
    0.39
    ring
    0.39
    rager
    0.38
    pend
    0.38
    Act Density 0.003%

    No Known Activations