INDEX
    Explanations

    dialog and interactions between noble characters

    New Auto-Interp
    Negative Logits
     (“
    -0.84
     (‘
    -0.80
    ),”
    -0.74
    ),
    
    -0.74
    .},
    -0.74
     ‘
    -0.73
    。)
    -0.71
    )』
    -0.70
    )」
    -0.69
    :
    
    -0.67
    POSITIVE LOGITS
    "
    3.00
    1.68
    "'
    1.44
    ''
    1.29
    "?
    1.24
    "-
    1.24
    ".
    1.23
    "(
    1.21
    "!
    1.20
    "…
    1.19
    Act Density 0.135%

    No Known Activations