INDEX
    Explanations

    phrases related to self-reference or self-awareness

    following "self-" prefix

    self-hyphenated descriptors

    New Auto-Interp
    Negative Logits
    YOND
    -0.70
    VideoCapture
    -0.62
     PLWABN
    -0.57
    Weblinks
    -0.55
    kör
    -0.54
     Recipro
    -0.52
     parteci
    -0.51
    strokeStyle
    -0.51
     peccato
    -0.50
     utafitiHapana
    -0.50
    POSITIVE LOGITS
     nahilalakip
    0.76
     flag
    0.56
    flag
    0.55
     Flag
    0.52
    tadır
    0.50
    centered
    0.49
     importance
    0.49
    righ
    0.49
     wichtiger
    0.48
     important
    0.48
    Act Density 0.153%

    No Known Activations