INDEX
    Explanations

    names of actors and characters in the context of films

    New Auto-Interp
    Negative Logits
    eb
    -0.16
    ctor
    -0.15
    seau
    -0.14
    ying
    -0.14
    aign
    -0.14
    ei
    -0.14
    çŃĸ
    -0.14
     Ep
    -0.14
    endors
    -0.13
    addir
    -0.13
    POSITIVE LOGITS
    ensem
    0.16
     dual
    0.16
    Dual
    0.15
     Dual
    0.15
     Nope
    0.14
    ÃĹ↵↵
    0.14
    .ms
    0.14
    687
    0.14
    виÑĩай
    0.14
    ãĥ©ãĥ³ãĤ¹
    0.14
    Act Density 0.027%

    No Known Activations