INDEX
Explanations
The neuron responds to mentions of “teacher” and “student” models (and related distillation terms) in the text.
New Auto-Interp
Negative Logits
盤
-0.07
groups
-0.06
ché
-0.06
⠀
-0.06
pe
-0.06
*****
-0.06
Riyadh
-0.06
graph
-0.06
('/')-0.06
resize
-0.06
POSITIVE LOGITS
oblivious
0.07
.Messaging
0.07
Vit
0.06
Lip
0.06
mell
0.06
resolutions
0.06
ecess
0.06
<QString
0.06
Default
0.06
uits
0.06
Activations Density 0.036%