INDEX
Explanations
terms related to specific concepts and entities, like flavors, behaviors, materials, and locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1783
+0.11
0.3%
394
+0.09
0.2%
1539
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1783
+0.11
0.05
1748
+0.09
0.03
1539
+0.09
0.03
Negative Logits
***!
-0.72
ControllerAdvice
-0.50
مرئيه
-0.49
ğına
-0.49
Elő
-0.47
officiels
-0.47
getreten
-0.45
CascadeType
-0.45
conséquence
-0.45
LabelTagHelper
-0.45
POSITIVE LOGITS
ftu
1.06
ftre
0.97
paff
0.95
effe
0.91
waer
0.90
thut
0.89
„,
0.89
canel
0.88
myn
0.88
dises
0.88
Activations Density 0.244%