INDEX
Explanations
mentions of graffiti in a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.13
0.4%
939
+0.09
0.3%
1013
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
939
+0.13
0.05
1804
+0.09
0.04
1780
+0.08
0.02
Negative Logits
hairc
-1.17
scrat
-1.07
cushi
-1.06
snoopy
-1.04
milf
-1.04
strick
-1.02
shenan
-1.01
perfet
-1.01
🤣🤣
-0.97
inconce
-0.96
POSITIVE LOGITS
vandalism
0.76
erected
0.68
vandal
0.66
installation
0.64
installed
0.62
mural
0.61
graffiti
0.60
installations
0.60
install
0.59
installed
0.55
Activations Density 0.551%