INDEX
    Explanations

    emotional expressions and references to human experiences

    New Auto-Interp
    Negative Logits
     cyn
    -0.15
    ection
    -0.15
    umber
    -0.15
    errar
    -0.15
    apple
    -0.14
    InitialState
    -0.14
    FromNib
    -0.14
     afs
    -0.13
    iad
    -0.13
     level
    -0.13
    POSITIVE LOGITS
     Dund
    0.16
    бом
    0.15
     fik
    0.15
    ingo
    0.14
    iko
    0.14
    upert
    0.14
    intree
    0.14
    astr
    0.14
    igm
    0.14
    ISCO
    0.14
    Act Density 0.004%

    No Known Activations