INDEX
    Explanations

    modelmodel versions and numbers

    mentions of specific language-model names, versions, or size identifiers (e.g., model names with suffixes like "-13B", "1.5", "16K", etc.).

    New Auto-Interp
    Negative Logits
    让你
    0.28
     fraught
    0.27
     yıldır
    0.27
    常见的
    0.27
    गाई
    0.26
     неред
    0.26
    0.26
     subordination
    0.26
     Scientology
    0.26
     Harry
    0.26
    POSITIVE LOGITS
     icin
    0.31
     version
    0.30
     eight
    0.28
     版本
    0.28
     II
    0.27
     modello
    0.27
     ursprünglich
    0.27
     training
    0.27
     Version
    0.27
     optimized
    0.26
    Act Density 0.185%

    No Known Activations