INDEX
    Explanations

    references to guidance, role models, and influential examples in various contexts

    New Auto-Interp
    Negative Logits
     Интер
    -0.37
     nghe
    -0.36
     INTER
    -0.36
    MAPPING
    -0.36
    ellite
    -0.36
    toplasmic
    -0.36
     diagnosing
    -0.36
    ajuku
    -0.35
    Gön
    -0.35
     zachod
    -0.35
    POSITIVE LOGITS
     Vorbild
    0.58
     example
    0.57
     inspire
    0.56
     demonstration
    0.54
     EXAMPLE
    0.54
    inspire
    0.54
    example
    0.53
     esempi
    0.51
     exempl
    0.51
    0.51
    Act Density 0.317%

    No Known Activations