INDEX
Explanations
dialogues
The neuron flags tokens that appear in the “Summary:” portion of the prompt (i.e. it activates on words in the summary rather than in the article).
New Auto-Interp
Negative Logits
ForgeryToken
-0.07
上が
-0.06
premium
-0.06
hundreds
-0.06
t�
-0.06
网
-0.06
Hundreds
-0.06
khỏe
-0.06
IFT
-0.06
airline
-0.06
POSITIVE LOGITS
увався
0.07
Merkezi
0.06
incess
0.06
وظ
0.06
sock
0.06
tabl
0.06
клуб
0.06
Sheila
0.06
-END
0.06
-ab
0.06
Activations Density 0.010%