INDEX
    Explanations

    words related to theatrical concepts or performance

    New Auto-Interp
    Negative Logits
    es
    -0.25
    hole
    -0.24
    halt
    -0.24
    ho
    -0.24
    hb
    -0.23
    t
    -0.23
    hora
    -0.22
    h
    -0.22
    hum
    -0.22
    hoff
    -0.22
    POSITIVE LOGITS
    ting
    0.31
    tempt
    0.29
    rices
    0.26
    rice
    0.26
    tempts
    0.25
    ernal
    0.25
    te
    0.25
    uration
    0.23
    ransition
    0.23
    ronic
    0.23
    Act Density 0.084%

    No Known Activations