INDEX
Explanations
comparisons between natural and chemical substances, as well as mentions of age groups and specific gender identities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.11
0.3%
198
+0.10
0.3%
1967
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
3
+0.11
0.04
1473
+0.10
0.02
1806
+0.08
0.03
Negative Logits
fuf
-1.67
reluct
-1.65
Intere
-1.61
disagre
-1.60
depic
-1.59
desir
-1.56
increa
-1.55
inev
-1.53
emphat
-1.53
?...
-1.52
POSITIVE LOGITS
otherwise
0.65
setOpaque
0.62
ഉ
0.61
alike
0.61
بالإنجليزية
0.60
بالإ
0.60
ones
0.60
película
0.59
viewWillAppear
0.59
beyond
0.59
Activations Density 0.140%