INDEX
    Explanations

    mentions of characters and their roles in narratives

    New Auto-Interp
    Negative Logits
    seo
    -0.17
    day
    -0.16
    ese
    -0.16
    ew
    -0.15
    amer
    -0.15
    arious
    -0.15
    orget
    -0.15
    yan
    -0.15
    ÑĢа
    -0.15
    oyo
    -0.15
    POSITIVE LOGITS
    istically
    0.36
    izations
    0.27
    istics
    0.25
    isation
    0.24
    istik
    0.24
    izing
    0.22
    itics
    0.21
    ised
    0.21
    izes
    0.20
    ized
    0.20
    Act Density 0.035%

    No Known Activations