INDEX
Explanations
references to places where people live, specifically focusing on "homes."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.21
1.3%
365
+0.14
0.8%
87
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
294
+0.21
0.02
311
+0.14
0.02
26
+0.12
0.02
Negative Logits
ı
-2.62
Ŀ
-2.61
¯
-2.55
-2.55
↵
-2.55
↵
-2.55
↵
-2.55
č↵
-2.55
↵
-2.55
<|outofrange|>
-2.55
POSITIVE LOGITS
oque
2.20
creen
2.14
chool
2.00
pun
1.99
heet
1.83
weet
1.77
cript
1.71
pose
1.69
heets
1.64
offence
1.63
Activations Density 0.082%