INDEX
    Explanations

    phrases indicating consistency with prior research or findings

    New Auto-Interp
    Negative Logits
    addPreferredGap
    -0.54
    Atsauces
    -0.41
    addGap
    -0.41
     препратки
    -0.40
    
    -0.40
    RegressionTest
    -0.39
    Ligações
    -0.39
    prefixer
    -0.35
     slutt
    -0.35
    tvguidetime
    -0.34
    POSITIVE LOGITS
    consistent
    0.60
    Consistent
    0.59
     consistent
    0.56
    characteristic
    0.51
    endfor
    0.50
     Consistent
    0.50
    EClass
    0.48
    expected
    0.47
     للمعارف
    0.47
     characteristic
    0.47
    Act Density 0.109%

    No Known Activations