INDEX
    Explanations

    attends to numeric values associated with certain features from specific tokens related to features or references

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.10
    2:0.14
    3:0.09
    4:0.07
    5:0.05
    6:0.22
    7:0.22
    Negative Logits
    expandindo
    -0.58
    دانشنامهٔ
    -0.52
     مرئيه
    -0.41
    aarrggbb
    -0.38
    abestanden
    -0.37
    RetentionPolicy
    -0.37
    bcryptjs
    -0.36
    PathVariable
    -0.36
    elemField
    -0.35
    createCell
    -0.35
    POSITIVE LOGITS
    putin
    0.34
    SequentialGroup
    0.31
    REP
    0.30
    onalds
    0.28
     Warszawa
    0.28
    idf
    0.28
     PrintWriter
    0.28
    PhysRevD
    0.28
    0.28
    goog
    0.27
    Act Density 0.002%

    No Known Activations