INDEX
Explanations
phrases related to decision-making and guiding principles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1044
+0.11
0.3%
674
+0.10
0.3%
1921
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1044
+0.11
0.03
1921
+0.10
0.03
1993
+0.09
0.03
Negative Logits
belliger
-0.65
Dica
-0.63
idać
-0.62
Marín
-0.62
Minha
-0.62
hdashline
-0.61
Gonçalves
-0.59
Præ
-0.59
shenan
-0.58
increa
-0.57
POSITIVE LOGITS
Walkover
0.66
regards
0.59
Datuak
0.57
تانيه
0.55
carina
0.53
urbanas
0.52
familières
0.50
acerca
0.49
regarding
0.49
cref
0.49
Activations Density 0.070%