INDEX
Explanations
file download related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.13
0.4%
184
+0.13
0.4%
453
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.13
0.02
536
+0.13
0.02
605
+0.11
0.01
Negative Logits
abetes
-0.60
فريبيس
-0.58
<<<<<<<<<<<<<<
-0.57
Abbiamo
-0.55
سكانية
-0.54
turi
-0.54
Được
-0.53
новниш
-0.53
cajones
-0.53
хьтан
-0.52
POSITIVE LOGITS
embodi
1.09
lola
0.93
contex
0.91
starbucks
0.89
panama
0.88
lts
0.88
sophie
0.87
ACKNOWLEDGMENTS
0.87
TheGreat
0.86
tiffany
0.86
Activations Density 0.173%