INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     kao
    1.00
     strange
    0.93
     pang
    0.91
     specifica
    0.90
     weird
    0.90
     activo
    0.87
     perplex
    0.86
     kwa
    0.86
    0.85
     mulig
    0.84
    POSITIVE LOGITS
    $\
    1.49
    \#
    1.41
    \[
    1.35
    $$\
    1.31
     $\
    1.25
    \-
    1.24
    \
    1.23
    \|
    1.23
    \|\
    1.16
     $\$
    1.11
    Act Density 0.075%

    No Known Activations