INDEX
Explanations
organizations, institutions, and government-related acronyms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.12
0.4%
50
+0.10
0.3%
478
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.12
0.07
577
+0.10
0.06
1544
+0.10
0.04
Negative Logits
vendar
-0.73
répondit
-0.73
kaban
-0.71
(;;)
-0.69
bonté
-0.69
télévis
-0.69
saurait
-0.69
romain
-0.69
originaire
-0.68
dégust
-0.67
POSITIVE LOGITS
shenan
0.82
ineffec
0.79
disagre
0.75
suscep
0.74
depic
0.71
unspeak
0.71
apprehen
0.69
impra
0.68
indescri
0.68
milf
0.68
Activations Density 0.285%