INDEX
    Explanations

    references to actors and their roles in films or shows

    New Auto-Interp
    Negative Logits
     voc
    -0.15
    ãĤ¾
    -0.15
    andler
    -0.14
     thá»ķ
    -0.14
    estre
    -0.14
    islav
    -0.14
     decomposition
    -0.14
     Wend
    -0.14
    utenberg
    -0.14
     же
    -0.14
    POSITIVE LOGITS
    hdl
    0.15
    ount
    0.14
    isch
    0.14
    Bond
    0.14
     Bond
    0.14
    gh
    0.14
    agh
    0.14
    à¸Ľà¸£à¸°à¸Ĭ
    0.14
    ars
    0.14
     ÄijÃłn
    0.13
    Act Density 0.020%

    No Known Activations