INDEX
    Explanations

    specific items in a list

    colons followed by lists or items

    New Auto-Interp
    Negative Logits
    gow
    -0.65
    orate
    -0.64
    oland
    -0.63
    ashington
    -0.62
    pty
    -0.61
    veland
    -0.59
    agon
    -0.58
    iliate
    -0.58
    ritten
    -0.58
    hed
    -0.57
    POSITIVE LOGITS
    <|endoftext|>
    1.15
    1.14
     âĹı
    1.08
     âĢ¢
    1.04
    ↵↵
    1.03
     ·
    0.99
     Firstly
    0.93
    ↵Âł
    0.90
    âĢ¢
    0.87
    âĹı
    0.86
    Act Density 0.120%

    No Known Activations