INDEX
Explanations
phrases related to empathy and racial understanding
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1356
+0.09
0.3%
1241
+0.08
0.2%
1078
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1650
+0.09
0.03
1241
+0.08
0.04
1714
+0.07
0.03
Negative Logits
shenan
-1.50
reluct
-1.48
disreg
-1.39
indestru
-1.38
accla
-1.34
disagre
-1.34
Juf
-1.33
inev
-1.32
excru
-1.32
uninten
-1.31
POSITIVE LOGITS
empathy
0.74
getItemId
0.68
experiences
0.64
postmedia
0.63
understand
0.62
MLLoader
0.61
perspective
0.60
realpath
0.59
onAttach
0.59
RTLR
0.59
Activations Density 0.368%