INDEX
Explanations
mentions of a specific name "Rod" with varying endings (e.g., Rod, Roddericks, Rodgers)
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
893
+0.11
0.4%
1339
+0.09
0.3%
506
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
893
+0.11
0.03
1120
+0.09
0.03
1614
+0.08
0.03
Negative Logits
<bos>
-0.99
-0.70
/*
-0.63
ⓧ
-0.61
kív
-0.60
LabelTagHelper
-0.59
прі
-0.59
فى
-0.56
endorse
-0.56
прий
-0.55
POSITIVE LOGITS
Rod
1.98
Rod
1.77
ROD
1.40
ekos
1.21
hcm
1.19
kafe
1.18
kasa
1.17
obé
1.17
keramik
1.17
mef
1.17
Activations Density 0.265%