INDEX
Explanations
phrases indicating the addition of information or the expressing of an opinion
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
32
+0.13
0.4%
568
+0.12
0.4%
1265
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
32
+0.13
0.04
568
+0.12
0.04
131
+0.10
0.03
Negative Logits
thermomix
-0.93
Confe
-0.79
Simult
-0.78
vito
-0.75
pican
-0.74
canel
-0.73
churrasco
-0.73
socie
-0.72
ecclesias
-0.71
Consig
-0.69
POSITIVE LOGITS
added
0.95
adding
0.93
added
0.89
add
0.89
ADDED
0.85
adds
0.84
Added
0.82
adding
0.81
addition
0.79
Added
0.78
Activations Density 0.058%