INDEX
Explanations
Gemma team at Google DeepMind
New Auto-Interp
Negative Logits
/**/*.{0.42
ब्राह
0.41
bagels
0.40
टो
0.38
JScripts
0.38
Elkus
0.38
prohibitions
0.38
chorizo
0.38
tôt
0.37
imageHeight
0.37
POSITIVE LOGITS
clipped
0.42
monkey
0.40
留学
0.36
evident
0.36
Move
0.35
ંમે
0.35
max
0.35
ামে
0.35
placeholder
0.35
xc
0.34
Activations Density 0.003%