INDEX
    Explanations

    references to interpersonal relationships and characters involved in dialogues

    New Auto-Interp
    Negative Logits
    atham
    -0.18
     Zimmerman
    -0.15
    pin
    -0.15
    tero
    -0.15
     Bowling
    -0.15
     neutral
    -0.15
    vet
    -0.14
     relations
    -0.14
    l
    -0.14
    ior
    -0.14
    POSITIVE LOGITS
    FORCE
    0.16
    .Layer
    0.15
    ustr
    0.15
    åĿĽ
    0.15
    vertiser
    0.14
    orns
    0.14
    -sama
    0.14
     fel
    0.14
     felony
    0.14
    ILD
    0.14
    Act Density 0.094%

    No Known Activations