INDEX
Explanations
statements indicating large quantities or percentages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.13
0.4%
1265
+0.10
0.3%
1983
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1983
+0.13
0.03
1516
+0.10
0.02
261
+0.09
0.01
Negative Logits
Pushkin
-0.57
adaptiveStyles
-0.49
raught
-0.47
;;)
-0.47
alip
-0.47
abstrait
-0.45
İstinadlar
-0.45
exemplaire
-0.43
StoreMessageInfo
-0.41
kasa
-0.41
POSITIVE LOGITS
躇
0.55
<bos>
0.51
tasche
0.49
muualla
0.48
alliés
0.47
molle
0.46
Romains
0.45
essoal
0.45
ruines
0.45
นะนำ
0.44
Activations Density 0.112%