INDEX
    Explanations

    utterance speaker labels or dialogue line prefixes (speaker names at the start of lines).

    New Auto-Interp
    Negative Logits
    theros
    -0.08
    ModelError
    -0.07
    .AD
    -0.07
    (""
    -0.07
    -0.07
    😗
    -0.07
    	glColor
    -0.07
    .BufferedReader
    -0.07
    -0.07
    ynom
    -0.07
    POSITIVE LOGITS
     Shen
    0.08
    altet
    0.07
    "});↵
    0.07
     والن
    0.07
     Schn
    0.07
    اعد
    0.07
     Sweden
    0.07
    来找
    0.07
     }(
    0.06
     קנ
    0.06
    Act Density 0.078%

    No Known Activations