INDEX
    Explanations

    hypothetical fights

    The neuron fires most strongly on tokens in “who would win in a fight…”–style question phrases, i.e. it detects the key words and punctuation of fight‐outcome queries.

    New Auto-Interp
    Negative Logits
    824
    -0.07
     WC
    -0.06
    ього
    -0.06
    'ét
    -0.06
     compact
    -0.06
     hammer
    -0.06
     naval
    -0.06
    yyyyMMdd
    -0.06
    .epoch
    -0.06
    getService
    -0.06
    POSITIVE LOGITS
    .Children
    0.07
    vince
    0.07
    0.06
     Kat
    0.06
     плит
    0.06
    gener
    0.06
    landers
    0.06
     compounded
    0.06
    portal
    0.06
     Hib
    0.06
    Act Density 0.027%

    No Known Activations