INDEX
    Explanations

    references to movies and their characteristics

    New Auto-Interp
    Negative Logits
    rew
    -0.16
    xp
    -0.15
    ös
    -0.15
    asts
    -0.15
    symbols
    -0.14
    struments
    -0.14
     byt
    -0.14
    idades
    -0.14
    hel
    -0.14
     gente
    -0.14
    POSITIVE LOGITS
     meisten
    0.20
     same
    0.18
     confines
    0.16
     mism
    0.16
     beiden
    0.15
    irth
    0.15
    ourd
    0.15
    ลาย
    0.15
     Adolescent
    0.15
    acher
    0.15
    Act Density 0.048%

    No Known Activations