INDEX
    Explanations

    references to dialogue and dialog-related structures in text

    New Auto-Interp
    Negative Logits
    MENT
    -0.16
    ment
    -0.16
     slee
    -0.15
    bound
    -0.15
    ugg
    -0.15
    aby
    -0.15
    idge
    -0.14
    ãģŀ
    -0.14
    igans
    -0.14
    cher
    -0.14
    POSITIVE LOGITS
    ues
    0.30
    /dialog
    0.20
    UES
    0.19
    uese
    0.18
    atical
    0.16
    (Dialog
    0.16
    gable
    0.15
    ãģĤãģ£ãģŁ
    0.15
    UE
    0.15
    缸æīĭ
    0.15
    Act Density 0.013%

    No Known Activations