INDEX
    Explanations

    proper nouns or names

    New Auto-Interp
    Negative Logits
     understatement
    -0.60
     Romanian
    -0.57
     agric
    -0.57
     stat
    -0.56
     summ
    -0.56
     stuffing
    -0.56
     nort
    -0.55
     overw
    -0.55
     actionGroup
    -0.55
     defic
    -0.55
    POSITIVE LOGITS
    K
    3.51
    KS
    2.32
    k
    2.19
    KI
    2.09
    KK
    2.06
    KR
    2.06
    KA
    1.95
    KER
    1.89
    KT
    1.89
     K
    1.88
    Act Density 0.027%

    No Known Activations