INDEX
Explanations
examples of various concepts, such as contradictions, cryptocurrency, and safety features in vehicles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1127
+0.10
0.3%
30
+0.10
0.3%
1052
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1052
+0.10
0.04
581
+0.10
0.04
1127
+0.10
0.03
Negative Logits
aphthalene
-0.64
idolat
-0.61
Cringe
-0.59
Hahahahaha
-0.58
útbol
-0.58
inexorable
-0.58
lepiej
-0.57
fusca
-0.57
felicity
-0.57
Diction
-0.56
POSITIVE LOGITS
example
1.14
examples
1.08
Example
1.02
example
1.01
Example
0.98
Examples
0.96
examples
0.95
Examples
0.89
exemple
0.82
EXAMPLE
0.81
Activations Density 0.079%