INDEX
Explanations
health-related words and medical conditions, especially symptoms and disorders
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.12
0.4%
198
+0.11
0.3%
453
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.12
0.03
946
+0.11
0.05
167
+0.10
0.03
Negative Logits
Geplaatst
-0.62
veniva
-0.52
doveva
-0.47
pú
-0.46
Италијани
-0.46
worin
-0.45
bolí
-0.44
RTLD
-0.43
DockStyle
-0.43
vitaminas
-0.42
POSITIVE LOGITS
intermitt
0.94
disagre
0.91
purcha
0.90
quitted
0.89
fuf
0.89
emphat
0.87
affor
0.85
inconce
0.84
befo
0.83
gaily
0.82
Activations Density 0.500%