INDEX
Explanations
mentions of college-related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.14
0.5%
1194
+0.12
0.5%
555
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1194
+0.14
0.03
1464
+0.12
0.03
1971
+0.12
0.03
Negative Logits
curé
-0.68
lapin
-0.58
bacio
-0.57
WALTZ
-0.57
guardare
-0.57
beverly
-0.56
loup
-0.56
fendi
-0.55
prêtre
-0.55
dakota
-0.54
POSITIVE LOGITS
college
1.48
College
1.34
college
1.34
College
1.29
colleges
1.24
COLLEGE
1.13
COLLEGE
1.06
Colleges
1.06
collegiate
0.73
campus
0.73
Activations Density 0.065%