INDEX
Explanations
names of people and places, as well as specific words and phrases related to political and historical contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.15
0.4%
876
+0.10
0.3%
872
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.15
0.06
678
+0.10
0.05
82
+0.09
0.03
Negative Logits
Volumen
-0.60
Denna
-0.59
InputDecoration
-0.57
História
-0.55
SeekBar
-0.55
Opéra
-0.54
calciatore
-0.54
Serviço
-0.53
Córdoba
-0.53
ExecuteNonQuery
-0.52
POSITIVE LOGITS
scrat
1.43
peppa
1.24
suscep
1.22
wikihow
1.17
excru
1.14
perfet
1.13
michelin
1.12
affez
1.11
unwarran
1.11
hairc
1.10
Activations Density 0.400%