INDEX
    Explanations

    references to sheep and goats

    New Auto-Interp
    Negative Logits
    ãĥĥãĤ¯ãĤ¹
    -0.16
    nder
    -0.15
    MORE
    -0.14
    179
    -0.14
    mour
    -0.14
    _RB
    -0.14
    lh
    -0.13
    adro
    -0.13
    yar
    -0.13
    ÙĦاÙĦ
    -0.13
    POSITIVE LOGITS
    alam
    0.17
    eko
    0.16
    innen
    0.15
    anship
    0.15
    eyed
    0.15
    eshire
    0.15
    é§
    0.15
    alian
    0.14
    els
    0.14
    ault
    0.14
    Act Density 0.019%

    No Known Activations