INDEX
    Explanations

    references to monsters and themes of control or dominance

    New Auto-Interp
    Negative Logits
       
    -0.21
     latter
    -0.19
    elman
    -0.17
    lessly
    -0.16
     Latter
    -0.15
    ulo
    -0.15
    cred
    -0.15
    ITED
    -0.14
    brick
    -0.14
    538
    -0.14
    POSITIVE LOGITS
    oton
    0.20
    ingly
    0.18
    ously
    0.17
    .Mon
    0.17
    itored
    0.16
    ous
    0.16
    aco
    0.16
    (mon
    0.16
    ÑĢаÑī
    0.16
    odies
    0.15
    Act Density 0.046%

    No Known Activations