INDEX
Explanations
instances where hard work and effort are mentioned or emphasized
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.13
0.4%
297
+0.10
0.3%
241
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
297
+0.13
0.04
1811
+0.10
0.03
1876
+0.10
0.03
Negative Logits
ivi
-0.64
parteci
-0.58
apparti
-0.56
grazia
-0.55
tind
-0.55
rimanere
-0.55
diame
-0.54
inder
-0.54
igno
-0.53
liev
-0.53
POSITIVE LOGITS
hard
0.76
work
0.68
effort
0.68
Hard
0.67
Work
0.67
Hard
0.65
harder
0.63
HARD
0.61
hardworking
0.61
Harder
0.61
Activations Density 0.243%