INDEX
Explanations
content related to entertainment
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.29
1.7%
365
+0.12
0.7%
376
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
258
+0.29
0.03
271
+0.12
0.01
203
+0.12
0.02
Negative Logits
ĥ½
-2.61
Į
-2.23
§
-2.20
ĨĴ
-2.13
ľĵ
-2.11
į
-2.10
yours
-1.97
apine
-1.90
½
-1.87
hers
-1.87
POSITIVE LOGITS
80211
1.82
ister
1.59
"}](#
1.50
istry
1.49
Box
1.47
ious
1.42
isty
1.39
Area
1.38
ist
1.38
area
1.37
Activations Density 3.669%