INDEX
    Explanations

    references to novels and fictional works

    New Auto-Interp
    Negative Logits
    [
    -0.55
    y
    -0.51
    Dance
    -0.51
    لاثة
    -0.50
     disciplinary
    -0.50
    in
    -0.50
     Ehren
    -0.49
    -
    -0.49
    ش
    -0.47
    .
    -0.47
    POSITIVE LOGITS
     novel
    3.63
     Novel
    3.38
    novel
    3.28
    Novel
    3.17
     NOVEL
    3.02
     novels
    2.51
     Novels
    2.27
     novelist
    1.80
     novelists
    1.74
     novelty
    1.64
    Act Density 0.069%

    No Known Activations