INDEX
    Explanations

    repel/repulsion

    The neuron fires on occurrences of words denoting repulsion (e.g. “repel,” “repulsion”).

    New Auto-Interp
    Negative Logits
          ↵      ↵
    -0.07
    開発
    -0.07
     ngừng
    -0.07
     che
    -0.06
     massively
    -0.06
     překlad
    -0.06
     banners
    -0.06
     ----------------------------------------------------------------------↵
    -0.06
    -0.06
     sight
    -0.06
    POSITIVE LOGITS
    uber
    0.07
     Pompe
    0.06
    ?id
    0.06
    .exc
    0.06
    .password
    0.06
    0.06
    (rhs
    0.06
    [out
    0.06
     alcan
    0.06
    0.06
    Act Density 0.003%

    No Known Activations