INDEX
    Explanations

    references to people and their attributes or actions

    New Auto-Interp
    Negative Logits
     Went
    -0.17
     Arch
    -0.16
    esian
    -0.15
    lags
    -0.15
     often
    -0.15
    pair
    -0.15
     blank
    -0.15
     Licht
    -0.14
     
    -0.14
     Often
    -0.14
    POSITIVE LOGITS
     eskort
    0.19
    rosse
    0.15
    ToDevice
    0.15
    icias
    0.14
    ãĤ¯ãĥĪ
    0.14
    RATION
    0.14
    rant
    0.14
    GLOBALS
    0.14
    ิà¸Ĺ
    0.14
    lendi
    0.14
    Act Density 0.035%

    No Known Activations