INDEX
Explanations
phrases related to systemic evaluation and oversight procedures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
56
+0.14
0.8%
425
+0.12
0.7%
199
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
425
+0.14
0.13
259
+0.12
0.03
199
+0.11
0.04
Negative Logits
actin
-1.59
áĢº
-1.56
á̝
-1.48
odot
-1.42
=(
-1.40
ophagus
-1.39
odium
-1.37
UTERS
-1.33
astom
-1.32
haps
-1.32
POSITIVE LOGITS
corner
1.55
©
1.54
ties
1.45
§
1.43
reinst
1.41
°
1.39
ļ
1.37
ĺ
1.35
[@
1.33
ĸ
1.32
Activations Density 3.924%