INDEX
Explanations
phrases involving reaching out to others or the community for support or connection
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1976
+0.07
0.2%
1233
+0.07
0.2%
279
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
279
+0.07
0.03
1620
+0.07
0.02
1233
+0.07
0.02
Negative Logits
ⓧ
-0.82
<bos>
-0.73
<?
-0.72
-0.70
<?
-0.69
/**
-0.63
/*
-0.60
spokoj
-0.54
chiếm
-0.54
constaté
-0.52
POSITIVE LOGITS
prodi
1.14
carc
1.12
incess
1.11
fatis
1.07
palme
1.07
saar
1.07
Palest
1.06
ados
1.03
Græ
1.01
monaster
1.00
Activations Density 0.153%