INDEX
Explanations
timestamps and textual excerpts from online conversations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.16
0.5%
1343
+0.15
0.5%
906
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.16
0.04
1429
+0.15
0.04
1431
+0.10
0.04
Negative Logits
surpl
-0.69
redé
-0.68
renou
-0.67
simplif
-0.65
doubl
-0.57
Sigue
-0.55
hairc
-0.52
plonge
-0.52
Haci
-0.52
Estar
-0.52
POSITIVE LOGITS
ambientale
0.48
orded
0.46
Paglinawan
0.45
sogget
0.44
stination
0.44
.$_
0.44
ਾਨ
0.44
TimeUnit
0.44
worin
0.43
horesis
0.43
Activations Density 0.121%