INDEX
Explanations
Gemma team at Google DeepMind
New Auto-Interp
Negative Logits
ierge
0.70
harga
0.68
िशा
0.62
sep
0.60
ಔ
0.58
센
0.58
ௐ
0.57
क्वीन
0.57
浐
0.57
लवकर
0.56
POSITIVE LOGITS
Dest
0.54
தண்ட
0.52
Rub
0.51
rub
0.51
rubbing
0.50
Gemma
0.50
Dest
0.50
ரீ
0.49
Rub
0.48
ixas
0.48
Activations Density 0.227%