INDEX
    Explanations

    various, different, many

    New Auto-Interp
    Negative Logits
    pe
    0.38
    aine
    0.38
    0.36
    the
    0.36
    either
    0.34
    wa
    0.34
    timestep
    0.34
    fresh
    0.33
     either
    0.33
    atea
    0.33
    POSITIVE LOGITS
     różnych
    0.49
     المختلفة
    0.47
     různých
    0.46
     várias
    0.44
     verschiedenen
    0.44
     amelyek
    0.43
     различ
    0.43
     різних
    0.43
     różne
    0.43
    いろいろ
    0.42
    Act Density 0.391%

    No Known Activations