INDEX
    Explanations

    references to George Orwell's works and related themes

    New Auto-Interp
    Negative Logits
    iales
    -0.16
     ten
    -0.15
    (FALSE
    -0.14
    ONGL
    -0.14
    ecta
    -0.14
    Ñħо
    -0.14
    ertino
    -0.13
    éϵ
    -0.13
    à¥įतम
    -0.13
    zd
    -0.13
    POSITIVE LOGITS
    180
    0.40
    400
    0.40
    500
    0.39
     
    0.39
    300
    0.37
    130
    0.36
    120
    0.36
    160
    0.35
    150
    0.35
    600
    0.34
    Act Density 0.489%

    No Known Activations