INDEX
Explanations
responses to interview questions or discussions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1445
+0.09
0.3%
872
+0.08
0.2%
381
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1134
+0.09
0.02
924
+0.08
0.03
1445
+0.07
0.03
Negative Logits
inappro
-0.82
Kün
-0.82
pamph
-0.78
sophistic
-0.77
dises
-0.72
schoolmaster
-0.71
caprice
-0.71
Schrö
-0.71
squa
-0.70
Frö
-0.70
POSITIVE LOGITS
How
0.73
виправивши
0.70
What
0.69
How
0.68
Does
0.66
Does
0.65
What
0.63
***!
0.62
urm
0.61
Are
0.61
Activations Density 0.105%