INDEX
    Explanations

    Plural nouns

    New Auto-Interp
    Negative Logits
     Hubb
    -0.07
    different
    -0.07
    	layer
    -0.07
     unhealthy
    -0.06
     openly
    -0.06
    Ro
    -0.06
     EAST
    -0.06
    .character
    -0.06
    AlmostEqual
    -0.06
    -0.06
    POSITIVE LOGITS
    EXEC
    0.06
    _info
    0.06
    人民
    0.06
    _outline
    0.06
    0.06
    ード
    0.06
     noticeable
    0.06
    iture
    0.06
    Reminder
    0.06
    Fecha
    0.06
    Act Density 0.051%

    No Known Activations