INDEX
    Explanations

    Haven't been able to detect a clear pattern in the provided activations for neuron 4 - further analysis might be needed

    the word "wouldn't" and its variations, indicating skepticism or hypothetical scenarios

    New Auto-Interp
    Negative Logits
    ULT
    -0.70
     Proced
    -0.66
     Gutenberg
    -0.58
    PI
    -0.57
    gaard
    -0.57
    dimensional
    -0.57
    Offline
    -0.57
     Casting
    -0.56
     Learning
    -0.56
     Butt
    -0.56
    POSITIVE LOGITS
    't
    1.29
    geon
    0.97
    atically
    0.85
    terness
    0.85
    geons
    0.82
    agy
    0.82
    acies
    0.80
    ¹
    0.79
    etsk
    0.78
    ģĸ
    0.77
    Act Density 0.017%

    No Known Activations