INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     představ
    -0.44
     przem
    -0.43
     it
    -0.42
    couvrez
    -0.40
     inilah
    -0.40
     geest
    -0.39
     nærm
    -0.38
     tegen
    -0.38
     this
    -0.37
     culmination
    -0.37
    POSITIVE LOGITS
     extra
    0.98
     EXTRA
    0.92
     zusätzlichen
    0.84
     additional
    0.80
     ekstra
    0.80
    extra
    0.79
     Extra
    0.77
    additional
    0.75
     zusätz
    0.75
     added
    0.74
    Act Density 0.015%

    No Known Activations