INDEX
Explanations
special characters or symbols, particularly those used in programming or encoding contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
97
+0.14
0.8%
427
+0.12
0.7%
302
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
74
+0.14
0.07
391
+0.12
0.08
85
+0.12
0.05
Negative Logits
anas
-1.59
orman
-1.50
night
-1.44
called
-1.43
Station
-1.43
located
-1.42
porter
-1.42
helf
-1.41
culo
-1.40
ellers
-1.40
POSITIVE LOGITS
regard
2.00
regards
1.85
Âĵ
1.50
respect
1.44
linger
1.43
ĻĤ
1.42
mold
1.39
OUN
1.39
itives
1.37
hood
1.34
Activations Density 2.614%