INDEX
Explanations
phrases related to denials, opposition, and intentions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
208
+0.10
0.3%
453
+0.09
0.3%
1042
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
208
+0.10
0.07
438
+0.09
0.06
392
+0.07
0.03
Negative Logits
bonbons
-0.58
createSlice
-0.57
jette
-0.57
<%@
-0.57
læng
-0.56
tiens
-0.55
appContext
-0.55
rempliss
-0.54
réuss
-0.54
serai
-0.53
POSITIVE LOGITS
whatsoever
0.56
anything
0.53
nor
0.51
wrongdoing
0.51
anything
0.49
any
0.49
Autoritní
0.48
intend
0.47
dafx
0.46
makro
0.46
Activations Density 0.338%