INDEX
Explanations
conjunctions indicating contrast or exception
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
699
+0.17
0.6%
674
+0.11
0.3%
1194
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
699
+0.17
0.06
1194
+0.11
0.03
1052
+0.10
0.04
Negative Logits
Rine
-0.86
compréhen
-0.73
délib
-0.72
Ruman
-0.70
appréci
-0.69
légitime
-0.68
Timp
-0.67
mycompany
-0.67
réaliste
-0.66
Kalis
-0.65
POSITIVE LOGITS
however
0.69
however
0.64
infarction
0.55
lably
0.52
nė
0.52
apetito
0.51
valsty
0.50
rehensive
0.50
doPost
0.50
riction
0.49
Activations Density 0.040%