INDEX
Explanations
intellectual task that human
New Auto-Interp
Negative Logits
depth
0.41
Vinyl
0.38
плат
0.37
測
0.37
Dad
0.36
ponents
0.36
vinyl
0.36
distintas
0.36
心理
0.36
深度
0.35
POSITIVE LOGITS
Metal
0.62
Metall
0.56
Slayer
0.55
Nuclear
0.53
Metal
0.52
NUCLEAR
0.49
METAL
0.48
Metals
0.47
핵
0.47
metals
0.46
Activations Density 0.002%