INDEX
Explanations
medical or technical terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.30
1.2%
764
+0.27
1.1%
137
+0.19
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.30
0.03
137
+0.27
0.05
764
+0.19
0.04
Negative Logits
<bos>
-0.67
дописавши
-0.63
bewerken
-0.57
Попис
-0.55
/***
-0.53
InstrumentedTest
-0.52
intios
-0.51
FormTagHelper
-0.51
Wiktionnaire
-0.51
Viitteet
-0.50
POSITIVE LOGITS
Wtf
0.79
🥲
0.79
Lmao
0.73
lmfao
0.72
🤣🤣
0.71
🙃
0.69
😭😭
0.68
😬
0.66
Minang
0.66
🤦
0.64
Activations Density 0.331%