INDEX
Explanations
references to the word "Man" and its various forms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
0.9%
67
+0.15
0.8%
2011
+0.15
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1516
+0.19
0.03
67
+0.15
0.03
1520
+0.15
0.03
Negative Logits
<bos>
-1.99
harmed
-0.66
resolve
-0.65
break
-0.59
get
-0.58
INVISIBLE
-0.57
-0.56
resolve
-0.56
-0.55
find
-0.55
POSITIVE LOGITS
ftu
1.42
Mémoires
1.38
Cfr
1.36
Juf
1.36
Bartholo
1.35
Abbé
1.35
fup
1.34
ftre
1.33
fep
1.33
xxv
1.33
Activations Density 0.117%