INDEX
    Explanations

    punctuation and formatting related to citations and references

    New Auto-Interp
    Negative Logits
    FACT
    -0.66
     Mann
    -0.65
    っか
    -0.65
    zhou
    -0.62
     bottom
    -0.61
     regular
    -0.60
     Morrison
    -0.59
     Maru
    -0.59
    brot
    -0.58
     bit
    -0.58
    POSITIVE LOGITS
    ()),
    1.47
    '),
    1.43
    ”),
    1.41
    "),
    1.40
    }),
    1.40
    )),
    1.40
    ]),
    1.39
    >),
    1.35
    ])),
    1.35
    ]),
    
    1.34
    Act Density 0.465%

    No Known Activations