INDEX
    Explanations

    elements related to generalization and evidence in various contexts

    New Auto-Interp
    Negative Logits
    帖最后由
    -0.47
     Selatan
    -0.39
    boer
    -0.37
    marvin
    -0.35
     Mahoney
    -0.33
     perquè
    -0.32
     Numerade
    -0.31
    Unit
    -0.31
     Hert
    -0.30
    execu
    -0.30
    POSITIVE LOGITS
    ########.
    0.66
     only
    0.65
     stanovnika
    0.60
    only
    0.59
     только
    0.59
     חיצוניים
    0.59
    featureID
    0.59
     tylko
    0.57
     seulement
    0.52
     Only
    0.51
    Act Density 1.242%

    No Known Activations