INDEX
Explanations
occurrences of the substring "Wa"
New Auto-Interp
Negative Logits
ment
-0.16
åij
-0.15
ÙħÙĨ
-0.15
ments
-0.15
udes
-0.15
меÑĩ
-0.15
کس
-0.15
olicit
-0.14
zar
-0.14
fy
-0.14
POSITIVE LOGITS
Wa
0.29
Wa
0.28
wa
0.26
WA
0.21
WA
0.18
wa
0.17
ifu
0.17
waiver
0.17
agner
0.17
ter
0.17
Activations Density 0.013%