INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.50
    rull
    -0.40
     unrecogn
    -0.40
     Niet
    -0.40
    centr
    -0.40
     Infórmanos
    -0.39
    otho
    -0.39
     betweenstory
    -0.38
     Crit
    -0.37
    chtig
    -0.36
    POSITIVE LOGITS
     waves
    2.02
     Waves
    1.89
    waves
    1.88
    Waves
    1.82
     wave
    1.80
    wave
    1.67
    Wave
    1.65
     Wave
    1.62
     WAVE
    1.48
    WAVE
    1.36
    Act Density 0.007%

    No Known Activations