INDEX
Explanations
phrases related to winning outcomes, particularly in the context of competitions or games
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
1.2%
1961
+0.10
0.5%
405
+0.08
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1526
+0.25
0.05
1652
+0.10
0.04
1038
+0.08
0.05
Negative Logits
<bos>
-3.00
HasIndex
-0.73
//
-0.73
/*
-0.72
/**
-0.71
/**
-0.70
public
-0.70
const
-0.68
//#
-0.67
HasAnnotation
-0.67
POSITIVE LOGITS
impra
2.17
increa
2.08
maneu
2.08
affor
2.03
Minang
1.94
Juf
1.86
inev
1.84
accla
1.82
stockholm
1.81
disagre
1.79
Activations Density 0.226%