INDEX
    Explanations

    phrases that express comparisons or contrasts

    New Auto-Interp
    Negative Logits
     UNUSED
    -0.17
     Anc
    -0.16
     Orc
    -0.15
    ople
    -0.15
    croft
    -0.14
    ooky
    -0.14
    allee
    -0.14
    397
    -0.14
    rome
    -0.13
    osp
    -0.13
    POSITIVE LOGITS
    artment
    0.15
    ksen
    0.15
    aycast
    0.14
    æ¿
    0.14
    NGTH
    0.14
    lements
    0.14
    olib
    0.14
    igar
    0.14
    å°Ķ
    0.14
    olf
    0.13
    Act Density 0.119%

    No Known Activations