INDEX
Explanations
punctuation
This neuron detects instructional phrases that specify using only information from the given documents.
New Auto-Interp
Negative Logits
bullshit
-0.07
Kew
-0.06
假
-0.06
ki
-0.06
ABCDEFG
-0.06
_iteration
-0.06
grooming
-0.06
Storyboard
-0.06
Particle
-0.06
890
-0.06
POSITIVE LOGITS
عز
0.07
Follow
0.07
hpp
0.06
вне
0.06
valued
0.06
�
0.06
bohydr
0.06
principales
0.06
фор
0.06
ger
0.06
Activations Density 0.024%