INDEX
    Explanations

    The neuron fires on occurrences of the word “expense” (and closely related expense‐tracking terms).

    New Auto-Interp
    Negative Logits
     weaving
    -0.07
     donor
    -0.07
    ideal
    -0.07
     cycling
    -0.07
     cycle
    -0.07
     sits
    -0.07
    olvable
    -0.06
     molecule
    -0.06
     Magnet
    -0.06
     bottle
    -0.06
    POSITIVE LOGITS
     expenses
    0.14
     Expenses
    0.14
     expense
    0.12
    expenses
    0.11
    expense
    0.10
     Expense
    0.10
    Expense
    0.10
     расход
    0.08
     еж
    0.07
    แผ
    0.07
    Act Density 0.005%

    No Known Activations