INDEX
Explanations
mentions of products, features, and benefits in text related to various industries and domains
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.10
0.3%
76
+0.08
0.2%
1553
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
49
+0.10
0.03
1415
+0.08
0.02
298
+0.08
0.03
Negative Logits
panik
-1.09
thuy
-1.03
saar
-0.97
kac
-0.97
kasa
-0.96
Strukt
-0.92
Kategor
-0.90
meis
-0.90
Kese
-0.88
gymnas
-0.88
POSITIVE LOGITS
tupperware
0.77
ecru
0.72
velour
0.70
Dizziness
0.69
nutella
0.64
differentiable
0.63
hairc
0.62
mascarpone
0.61
tufted
0.60
paisley
0.60
Activations Density 0.237%