INDEX
    Explanations

    references to nobility or aristocracy

    New Auto-Interp
    Negative Logits
    acen
    -0.18
    cı
    -0.18
     Fritz
    -0.17
    icked
    -0.16
    PIX
    -0.15
    eldon
    -0.15
    agh
    -0.15
    quin
    -0.15
    abaj
    -0.15
    ocab
    -0.14
    POSITIVE LOGITS
    les
    0.32
    lemen
    0.30
    ility
    0.29
    odies
    0.26
    LES
    0.23
    iliary
    0.22
    ilities
    0.21
    bery
    0.19
    ilis
    0.19
    bled
    0.18
    Act Density 0.005%

    No Known Activations