INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .....
    1.36
     dgn
    1.28
    .......
    1.26
    ........
    1.14
    Perhaps
    1.13
    ......
    1.13
     perhaps
    1.12
    )....
    1.11
     VERY
    1.11
    perhaps
    1.10
    POSITIVE LOGITS
     dude
    1.07
     (`
    0.99
    <unused2197>
    0.98
    めっちゃ
    0.96
    <unused2204>
    0.94
    <unused2221>
    0.94
    0.92
     yeah
    0.90
    <unused2169>
    0.88
    0.86
    Act Density 0.146%

    No Known Activations