INDEX
Explanations
Gemma, an open-weights model
New Auto-Interp
Negative Logits
룸
0.76
aditional
0.75
Room
0.68
домаш
0.68
atot
0.68
iedad
0.66
スマ
0.65
ai
0.64
कक्ष
0.63
Room
0.63
POSITIVE LOGITS
epsilon
0.79
stainless
0.77
exposed
0.77
raw
0.75
glassy
0.75
Exposed
0.74
abiertas
0.73
ওপ
0.73
exposed
0.73
abiertos
0.72
Activations Density 0.027%