INDEX
Explanations
instances of the word "bed"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1590
+0.13
0.5%
1413
+0.13
0.5%
1516
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1590
+0.13
0.03
1413
+0.13
0.02
1516
+0.13
0.02
Negative Logits
nehme
-0.43
ianza
-0.43
möge
-0.42
RetentionPolicy
-0.41
PhysRevLett
-0.41
Dilution
-0.41
Dizziness
-0.39
ppb
-0.39
forder
-0.39
wacht
-0.38
POSITIVE LOGITS
bed
1.38
Bed
1.35
Bed
1.27
BED
1.26
bed
1.22
beds
1.16
beds
1.12
Beds
1.11
BED
1.11
Beds
1.01
Activations Density 0.060%