INDEX
Explanations
references to specific scenes or moments in a story or movie
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.15
0.4%
906
+0.14
0.4%
1577
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.15
0.04
2044
+0.14
0.04
334
+0.13
0.03
Negative Logits
bandung
-1.29
hcm
-1.28
maksi
-1.25
jaya
-1.21
lele
-1.21
Tangerang
-1.18
siena
-1.16
saar
-1.15
Juf
-1.14
dises
-1.14
POSITIVE LOGITS
hilarious
0.63
memorable
0.62
highlight
0.57
fun
0.55
bardziej
0.54
really
0.54
executed
0.54
unforgettable
0.53
wow
0.53
świet
0.53
Activations Density 0.250%