INDEX
Explanations
information regarding copyright ownership and permissions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.10
0.4%
50
+0.07
0.3%
789
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1109
+0.10
0.04
1802
+0.07
0.04
1150
+0.06
0.03
Negative Logits
<bos>
-1.44
/**
-0.72
<!--
-0.71
/*
-0.71
pull
-0.70
ുറ
-0.68
ⓧ
-0.67
run
-0.65
解
-0.64
expand
-0.63
POSITIVE LOGITS
copyrighted
2.07
maneu
1.72
affor
1.71
Juf
1.63
impra
1.63
maroc
1.62
philanth
1.61
eiffel
1.61
suscep
1.59
swarovski
1.58
Activations Density 0.104%