INDEX
Explanations
foreign languages and concepts
New Auto-Interp
Negative Logits
A
0.75
n
0.70
p
0.59
J
0.58
ford
0.53
gdf
0.53
N
0.53
r
0.50
X
0.49
T
0.49
POSITIVE LOGITS
څرنګ
0.57
Гуляць
0.56
chromos
0.54
Кан
0.53
언
0.53
Стаўкі
0.53
ವಿದೆ
0.52
の値
0.52
какво
0.50
sandwiched
0.49
Activations Density 0.000%