INDEX
    Explanations

    The neuron activates on occurrences of the word “reverse” (including “reversed”).

    New Auto-Interp
    Negative Logits
    Lua
    -0.06
     '|'
    -0.06
     Jain
    -0.06
    _Height
    -0.06
    sticky
    -0.06
     зуст
    -0.06
     vacations
    -0.06
    Ship
    -0.06
    -0.06
    "What
    -0.06
    POSITIVE LOGITS
     protester
    0.08
     devam
    0.07
     createState
    0.06
    ução
    0.06
    					 
    0.06
    	glog
    0.06
     مقابل
    0.06
    cepts
    0.06
     الناس
    0.06
     đoán
    0.06
    Act Density 0.006%

    No Known Activations