INDEX
Explanations
mentions of educational achievements and aspirations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
297
+0.10
0.3%
509
+0.09
0.3%
738
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1511
+0.10
0.01
761
+0.09
0.04
1915
+0.09
0.04
Negative Logits
glab
-0.72
bax
-0.72
ù
-0.72
mef
-0.71
Bibl
-0.70
cyr
-0.69
pellic
-0.69
anhyd
-0.68
istr
-0.67
tuc
-0.67
POSITIVE LOGITS
<bos>
0.79
graduating
0.78
graduation
0.73
majoring
0.71
graduate
0.70
student
0.69
academically
0.69
studying
0.66
academic
0.66
semester
0.66
Activations Density 0.566%