INDEX
Explanations
phrases related to questioning or highlighting specific issues for further examination
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.11
0.3%
1473
+0.10
0.3%
919
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
919
+0.11
0.03
59
+0.10
0.04
342
+0.10
0.04
Negative Logits
klap
-0.97
peculi
-0.94
spion
-0.94
stoff
-0.92
ordina
-0.92
plak
-0.90
solidar
-0.89
simplif
-0.88
profi
-0.86
hek
-0.84
POSITIVE LOGITS
raught
0.62
sobald
0.59
syndrome
0.57
bigsqcup
0.56
andaş
0.54
AfterClass
0.53
messageInfo
0.53
(['/
0.52
findViewById
0.51
addField
0.49
Activations Density 0.641%