INDEX
    Explanations

    mentions of people and their roles or actions within the context of narratives

    New Auto-Interp
    Negative Logits
     embark
    -0.15
    626
    -0.15
    ameleon
    -0.14
     pdf
    -0.14
    UND
    -0.13
    ↵↵
    -0.13
     ot
    -0.13
    pdf
    -0.12
    fen
    -0.12
    _fold
    -0.12
    POSITIVE LOGITS
     serves
    0.27
     serve
    0.25
     heads
    0.23
     served
    0.23
    heads
    0.21
    serve
    0.20
     runs
    0.20
     head
    0.19
     works
    0.19
     spear
    0.19
    Act Density 0.238%

    No Known Activations