INDEX
Explanations
words related to locations or proper nouns related to buildings/places
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
966
+0.10
0.4%
50
+0.10
0.4%
528
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.10
0.05
214
+0.10
0.03
283
+0.07
0.01
Negative Logits
<bos>
-1.05
lenmiş
-0.74
/*++
-0.74
lenir
-0.67
would
-0.67
<?
-0.66
became
-0.65
keep
-0.65
public
-0.65
accept
-0.65
POSITIVE LOGITS
le
2.00
mef
1.74
effe
1.71
illi
1.68
sovere
1.68
Intere
1.67
socie
1.65
maneu
1.64
véhic
1.64
erec
1.63
Activations Density 0.202%