INDEX
Explanations
phrases related to specific technical information or model names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1482
+0.15
0.6%
484
+0.13
0.5%
489
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
484
+0.15
0.03
11
+0.13
0.03
489
+0.13
0.03
Negative Logits
miyor
-0.85
iyor
-0.75
miyorum
-0.67
mişti
-0.63
artney
-0.61
diğimiz
-0.58
diğ
-0.56
Allister
-0.56
diğinde
-0.56
diğini
-0.55
POSITIVE LOGITS
Mezz
0.97
mezz
0.95
Me
0.95
Me
0.94
ecru
0.91
Megal
0.90
compréhen
0.90
mef
0.88
Meg
0.87
MEG
0.86
Activations Density 0.172%