INDEX
Explanations
mentions of the word "honor" or "honour"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1065
+0.16
0.6%
897
+0.15
0.6%
596
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1065
+0.16
0.03
1490
+0.15
0.03
596
+0.13
0.02
Negative Logits
habet
-0.56
potest
-0.52
caufe
-0.48
quæ
-0.47
씬
-0.47
smtplib
-0.46
projekty
-0.46
Controllo
-0.45
cifix
-0.45
perature
-0.44
POSITIVE LOGITS
honor
1.25
Honor
1.21
Honor
1.19
HONOR
1.16
honors
1.11
honored
1.09
honour
1.09
honor
1.08
honoring
1.04
Honors
1.00
Activations Density 0.090%