INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.33
1.4%
823
+0.05
0.2%
831
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.33
0.00
0
-0.05
0.00
1
-0.04
0.00
Negative Logits
treachery
-0.88
untenable
-0.88
traitors
-0.88
odious
-0.87
demoral
-0.86
massacres
-0.85
blasphemy
-0.85
cowardice
-0.84
ruinous
-0.83
disgraceful
-0.82
POSITIVE LOGITS
<bos>
10.60
betweenstory
1.89
expandindo
1.87
dispen
1.86
Autoritní
1.84
GEBURTSDATUM
1.82
ordina
1.58
'\\;'
1.57
تقاوى
1.56
»>
1.54
Activations Density 0.000%
No Known Activations
This feature has no known activations.