INDEX
Explanations
descriptive adjectives related to physical attributes and characteristics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
433
+0.11
0.6%
279
+0.11
0.6%
37
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
433
+0.11
0.06
481
+0.11
0.09
37
+0.10
0.06
Negative Logits
Inflater
-1.50
icle
-1.48
ADVERTISEMENT
-1.48
surroundings
-1.44
conveyed
-1.41
life
-1.40
adem
-1.38
terday
-1.37
raf
-1.37
blogger
-1.35
POSITIVE LOGITS
)\
1.40
ROC
1.37
gated
1.36
charter
1.35
FDR
1.35
âĢī
1.33
Bonferroni
1.32
Elect
1.31
seat
1.28
clustered
1.28
Activations Density 1.608%