INDEX
    Explanations

    phrases referring to certain characteristics or types mentioned in a comparison

    references to types or categories of things

    New Auto-Interp
    Negative Logits
    ansas
    -0.71
    LESS
    -0.67
    Shut
    -0.63
    MAP
    -0.63
    CD
    -0.61
    hole
    -0.60
    Downloadha
    -0.59
    idav
    -0.58
     Goodbye
    -0.56
    adan
    -0.56
    POSITIVE LOGITS
     magnitude
    1.48
     caliber
    1.43
     calib
    1.36
     nature
    1.32
     stature
    1.30
     importance
    1.24
     proportions
    1.23
     il
    1.23
     size
    1.20
     sorts
    1.17
    Act Density 0.179%

    No Known Activations