INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ']);
    -0.07
     traditions
    -0.07
     sans
    -0.07
     etiquette
    -0.07
    Indented
    -0.06
     bakımından
    -0.06
     singles
    -0.06
     seasonal
    -0.06
    	valid
    -0.06
     repeated
    -0.06
    POSITIVE LOGITS
    svm
    0.07
     konz
    0.06
     Гри
    0.06
    _ul
    0.06
    озв
    0.06
     Yer
    0.06
     gyro
    0.06
    DU
    0.06
    ková
    0.06
    awk
    0.06
    Act Density 0.007%

    No Known Activations