INDEX
Explanations
instances where the word "affirm" is mentioned quite strongly
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1677
+0.07
0.2%
1376
+0.07
0.2%
735
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1120
+0.07
0.03
1363
+0.07
0.03
1905
+0.07
0.03
Negative Logits
fortawesome
-0.59
퀀
-0.57
/*
-0.56
fit
-0.55
يديو
-0.54
thenReturn
-0.53
Parcelize
-0.51
kit
-0.51
fits
-0.50
kits
-0.50
POSITIVE LOGITS
affirm
1.39
affirmation
1.33
affirmed
1.24
disagre
1.24
shenan
1.19
affirms
1.18
affirming
1.17
emphat
1.16
Juf
1.14
Affirm
1.13
Activations Density 0.278%