INDEX
Explanations
references to specific locations or organizations related to social or political topics, such as religious institutions, labor unions, or government reports
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.23
0.8%
50
+0.13
0.4%
227
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.23
0.09
732
+0.13
0.07
1984
+0.11
0.08
Negative Logits
<bos>
-1.11
Personendaten
-0.69
==""){-0.69
lmfao
-0.64
idać
-0.64
bandai
-0.64
reszcie
-0.62
yaşında
-0.61
Савезне
-0.61
ricorda
-0.60
POSITIVE LOGITS
Chá
0.84
Lég
0.83
Jú
0.79
Bár
0.78
alté
0.78
raste
0.78
Teks
0.78
Hæ
0.78
Libri
0.77
Jä
0.76
Activations Density 0.477%