INDEX
    Explanations

    here's a breakdown/explanation

    New Auto-Interp
    Negative Logits
    Examples
    0.86
    examples
    0.81
    例えば
    0.79
    example
    0.77
    そのような
    0.77
     etmektedir
    0.76
    たとえば
    0.74
     esempi
    0.71
     beispielsweise
    0.71
     exempel
    0.70
    POSITIVE LOGITS
     spoiler
    1.15
     Spoiler
    1.15
     buckle
    1.12
     prepping
    1.06
     prepare
    1.01
     Prepare
    1.00
     figuring
    0.97
     Here
    0.96
     here
    0.95
     caveat
    0.94
    Act Density 0.671%

    No Known Activations