INDEX
    Explanations

    the presence of actor names in the context of film descriptions

    New Auto-Interp
    Negative Logits
    agne
    -0.14
    apol
    -0.14
    tember
    -0.14
    ÑĢÑıдÑĥ
    -0.14
     exercitation
    -0.14
    -Star
    -0.14
    .LogWarning
    -0.13
    696
    -0.13
    kud
    -0.13
    _EXTERN
    -0.13
    POSITIVE LOGITS
     ab
    0.15
     cor
    0.15
    oola
    0.14
     scene
    0.14
     Trace
    0.14
    ubo
    0.14
    rada
    0.14
     Invisible
    0.13
     stabil
    0.13
    otate
    0.13
    Act Density 0.060%

    No Known Activations