INDEX
Explanations
phrases related to cause and effect or consequences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
468
+0.12
0.4%
1363
+0.12
0.4%
1233
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1133
+0.12
0.03
1233
+0.12
0.03
468
+0.12
0.02
Negative Logits
Salta
-0.60
Jörg
-0.58
inol
-0.56
Buona
-0.56
HSSF
-0.55
interro
-0.54
Molto
-0.53
Bisa
-0.52
Oltre
-0.51
Junho
-0.50
POSITIVE LOGITS
consequences
1.20
Consequences
1.08
consequence
1.02
Consequences
0.94
CONSEQU
0.87
repercussions
0.78
consecuencias
0.76
implications
0.73
ramifications
0.73
konsek
0.72
Activations Density 0.081%