INDEX
Explanations
universities or academic institutions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.16
0.5%
1741
+0.16
0.5%
752
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
69
+0.16
0.02
227
+0.16
0.03
203
+0.14
0.02
Negative Logits
overcrow
-1.16
moistened
-1.05
intersper
-1.03
disreg
-1.01
tupperware
-0.94
lmfao
-0.94
friable
-0.94
subgoals
-0.92
cushi
-0.91
shenan
-0.89
POSITIVE LOGITS
alkoh
1.69
kosme
1.67
silikon
1.59
makro
1.52
keramik
1.51
antik
1.48
kön
1.46
akut
1.45
kompati
1.44
minimalis
1.44
Activations Density 0.061%