INDEX
    Explanations

    weight gain

    New Auto-Interp
    Negative Logits
    lendirme
    -0.07
     '?'
    -0.07
    jp
    -0.07
    ду
    -0.07
    -0.07
    _clause
    -0.07
    oa
    -0.06
     Gtk
    -0.06
    _adj
    -0.06
     fuer
    -0.06
    POSITIVE LOGITS
     กรก
    0.06
    [int
    0.06
     bou
    0.06
    0.06
     imperial
    0.06
    .Println
    0.05
     gem
    0.05
     Phương
    0.05
    Another
    0.05
     العامة
    0.05
    Act Density 0.011%

    No Known Activations