INDEX
Explanations
descriptions or opinions about favorites or preferences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
897
+0.13
0.4%
629
+0.12
0.4%
1839
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
629
+0.13
0.02
783
+0.12
0.02
1839
+0.11
0.02
Negative Logits
iesp
-0.58
Statistiche
-0.56
"]));
-0.54
")"
-0.54
"/",
-0.53
">...
-0.52
]),
-0.51
'])->
-0.50
])){-0.50
]*(
-0.49
POSITIVE LOGITS
favorite
0.85
Favorite
0.74
FAVORITE
0.72
favorite
0.70
favorites
0.68
maneu
0.67
Favorite
0.67
Compañ
0.66
orite
0.65
favourite
0.65
Activations Density 0.083%