INDEX
Explanations
references to the word "Rom" or related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.15
0.9%
332
+0.13
0.8%
376
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
332
+0.15
0.02
410
+0.13
0.01
285
+0.13
0.01
Negative Logits
rapeut
-2.10
supporting
-1.49
ctica
-1.44
warning
-1.43
investigating
-1.42
signs
-1.41
Investig
-1.40
facts
-1.40
reper
-1.40
uties
-1.38
POSITIVE LOGITS
pton
1.89
ouin
1.89
µL
1.69
uet
1.67
uel
1.60
ikel
1.57
ping
1.48
imoto
1.47
gae
1.47
ilee
1.46
Activations Density 0.038%