INDEX
Explanations
phrases related to articles, grammar, and linguistic analysis
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
0.9%
1741
+0.21
0.7%
2019
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
50
+0.26
0.06
16
+0.21
0.07
1967
+0.14
0.04
Negative Logits
<bos>
-0.81
shenan
-0.72
🤣🤣
-0.72
yoda
-0.69
milf
-0.68
igts
-0.66
lmfao
-0.65
Lmao
-0.64
firebaseConfig
-0.63
😭😭
-0.63
POSITIVE LOGITS
Glej
0.65
Abraço
0.62
Obrigada
0.59
excelente
0.59
Secara
0.58
Mitä
0.57
Abraços
0.56
Adapun
0.56
Köszönöm
0.55
Hogyan
0.54
Activations Density 0.330%