INDEX
    Explanations

    phrases indicating existence or presence

    New Auto-Interp
    Negative Logits
    rna
    -0.15
    ami
    -0.15
    oya
    -0.15
    ıi
    -0.14
     bor
    -0.14
    gend
    -0.14
    ue
    -0.14
    é«
    -0.14
    å°ĸ
    -0.13
     Til
    -0.13
    POSITIVE LOGITS
     lies
    0.21
     lie
    0.18
    reich
    0.18
    olit
    0.15
     Lies
    0.15
     lying
    0.15
    yonel
    0.14
    .ERR
    0.14
    'aff
    0.14
    yt
    0.14
    Act Density 0.060%

    No Known Activations