INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
b
0.44
c
0.39
1
0.39
t
0.39
2
0.39
0
0.38
ing
0.38
3
0.36
-
0.35
0.35
POSITIVE LOGITS
dieci
0.45
interrom
0.41
oito
0.39
entusiasmo
0.38
vijf
0.38
linguagem
0.38
zweimal
0.38
uomini
0.37
zusamm
0.37
dotato
0.37
Activations Density 0.000%
No Known Activations
This feature has no known activations.