INDEX
Explanations
prepositions and words related to communication and response
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1896
+0.12
0.4%
752
+0.10
0.3%
1023
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
898
+0.12
0.04
761
+0.10
0.03
1845
+0.10
0.03
Negative Logits
audiovisuel
-0.60
peines
-0.59
dégust
-0.58
<bos>
-0.57
déclarations
-0.56
toscana
-0.56
gawas
-0.55
Améli
-0.54
boîtes
-0.53
ఔ
-0.52
POSITIVE LOGITS
stimuli
0.66
requests
0.54
criticisms
0.53
stimulus
0.51
questions
0.51
criticism
0.50
challenges
0.49
suggestions
0.48
queries
0.48
threats
0.48
Activations Density 0.165%