INDEX
    Explanations

    words followed by punctuation

    New Auto-Interp
    Negative Logits
    0.85
     सीएचएसएल
    0.80
    0.80
    <unused1738>
    0.80
    0.80
    <unused231>
    0.78
    ὺς
    0.77
    <unused1666>
    0.76
    0.76
    0.76
    POSITIVE LOGITS
    <eos>
    3.73
    2.11
    <end_of_turn>
    1.99
    ↵↵↵↵↵↵↵↵
    1.87
    ↵↵↵↵↵↵↵↵↵↵
    1.79
    ↵↵↵↵↵↵↵↵↵
    1.78
    ↵↵↵↵↵↵↵↵↵↵↵↵
    1.75
    ↵↵↵↵↵↵↵↵↵↵↵↵↵
    1.74
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    1.72
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    1.71
    Act Density 0.707%

    No Known Activations