INDEX
Explanations
phrases related to opinions and evaluations of movies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1919
+0.08
0.2%
143
+0.07
0.2%
3
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.08
0.06
1607
+0.07
0.03
143
+0.06
0.04
Negative Logits
ordina
-1.18
sappi
-1.16
fluo
-1.16
nutr
-1.16
canel
-1.15
franz
-1.15
lele
-1.13
mef
-1.12
kram
-1.12
hcm
-1.11
POSITIVE LOGITS
reluctantly
0.79
admit
0.77
agree
0.77
acknowledge
0.74
grud
0.74
eventually
0.73
agreed
0.71
admitted
0.69
acknowledging
0.69
finally
0.68
Activations Density 0.361%