INDEX
    Explanations

    mentions of a specific architectural landmark, particularly variations of its name

    New Auto-Interp
    Negative Logits
    suppress
    -0.16
    침
    -0.15
    jak
    -0.15
    arend
    -0.15
    er
    -0.15
    dst
    -0.14
    stice
    -0.14
    velopment
    -0.14
    erdem
    -0.14
    临
    -0.14
    POSITIVE LOGITS
     cast
    0.28
    .Cast
    0.27
     Cast
    0.27
     iron
    0.24
    .cast
    0.23
    ellan
    0.22
     Iron
    0.22
    les
    0.21
    ell
    0.21
    Cast
    0.21
    Act Density 0.008%

    No Known Activations