INDEX
    Explanations

    references to different types of media such as TV shows, movies, books, and anime

    New Auto-Interp
    Negative Logits
    dden
    -0.67
     Helpful
    -0.59
     forgetting
    -0.59
    sbm
    -0.56
    Unknown
    -0.56
    Redd
    -0.55
     Wrong
    -0.55
    ãģ¦
    -0.54
    dropping
    -0.54
     understatement
    -0.54
    POSITIVE LOGITS
     consists
    1.29
     consisted
    1.28
     comprises
    1.16
     revolves
    1.15
     debuted
    1.15
     contains
    1.03
     boasts
    1.02
     premiered
    1.02
     underwent
    1.01
     originated
    1.01
    Act Density 0.324%

    No Known Activations