INDEX
Explanations
proper nouns related to a specific individual named Martin
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
32
+0.23
1.0%
555
+0.13
0.6%
1177
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
32
+0.23
0.03
555
+0.13
0.02
629
+0.12
0.02
Negative Logits
totalPrice
-0.51
userEmail
-0.50
angebra
-0.50
postData
-0.49
Allister
-0.49
pageNo
-0.49
aufgehoben
-0.49
erklär
-0.47
julie
-0.47
rakech
-0.47
POSITIVE LOGITS
Martin
1.47
Martin
1.42
martin
1.34
MARTIN
1.29
MARTIN
1.29
Martins
1.18
Martín
1.13
martin
1.04
mart
1.01
Martín
0.93
Activations Density 0.094%