INDEX
Explanations
information about organizations, locations, events, and other specific entities mentioned in documents or news articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.12
0.3%
964
+0.11
0.3%
1013
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.12
0.09
964
+0.11
0.06
766
+0.11
0.06
Negative Logits
vola
-0.70
hek
-0.67
krab
-0.65
toilette
-0.62
quitt
-0.61
marte
-0.61
aquare
-0.59
stiller
-0.59
zui
-0.57
elek
-0.57
POSITIVE LOGITS
maneu
0.81
shenan
0.79
depic
0.78
sophistic
0.75
stickied
0.72
accla
0.72
impra
0.70
attemp
0.68
upvoted
0.67
resemb
0.67
Activations Density 0.872%