INDEX
    Explanations

    comparative adjectives indicating a higher degree or quantity

    New Auto-Interp
    Negative Logits
    <eos>
    -0.41
    Special
    -0.36
    -0.34
    經過
    -0.32
    for
    -0.31
     Penga
    -0.31
    leading
    -0.30
     Special
    -0.30
    The
    -0.29
    these
    -0.28
    POSITIVE LOGITS
     パンチラ
    0.88
    <unused3>
    0.87
    <unused14>
    0.87
    <unused41>
    0.87
    <unused42>
    0.87
    [@BOS@]
    0.87
    <unused8>
    0.87
    <unused17>
    0.87
    <unused74>
    0.87
    <unused16>
    0.87
    Act Density 0.042%

    No Known Activations