INDEX
Explanations
words related to battles, fights, and conflicts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
549
+0.10
0.3%
1056
+0.08
0.2%
1942
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
549
+0.10
0.04
1056
+0.08
0.04
1041
+0.08
0.03
Negative Logits
FormState
-0.53
Place
-0.53
琥
-0.47
place
-0.45
Aus
-0.45
samt
-0.45
加盟
-0.44
model
-0.44
MODEL
-0.43
ColumnHeaders
-0.43
POSITIVE LOGITS
swarovski
1.25
peppa
1.09
embodi
1.07
lidl
1.06
michelin
1.03
nutella
1.03
tiffany
1.02
eiffel
1.00
scrat
0.98
stockholm
0.98
Activations Density 0.210%