INDEX
    Explanations

    references to visual media and links to supplementary materials

    New Auto-Interp
    Negative Logits
    ambi
    -0.14
    íį¼
    -0.12
    ozem
    -0.12
    rug
    -0.12
    \Json
    -0.12
    ë»
    -0.12
    ëŀĺìĬ¤
    -0.12
     stav
    -0.11
    ynam
    -0.11
    rve
    -0.11
    POSITIVE LOGITS
     below
    0.59
     above
    0.47
    below
    0.45
     BELOW
    0.42
     beneath
    0.40
     blow
    0.38
     ниже
    0.36
     underneath
    0.36
     Below
    0.36
     bel
    0.35
    Act Density 0.142%

    No Known Activations