INDEX
Explanations
model responses about capabilities
New Auto-Interp
Negative Logits
Here
0.60
هنا
0.58
aquí
0.57
here
0.57
Here
0.57
HERE
0.53
Aquí
0.52
HERE
0.51
aqui
0.50
tutaj
0.50
POSITIVE LOGITS
hexyl
0.45
Talk
0.40
concret
0.40
programs
0.39
adet
0.38
функциони
0.37
Jared
0.37
reviewer
0.37
industrial
0.36
Junk
0.36
Activations Density 0.112%