INDEX
Explanations
specific references to health-related projects and methodologies
after certain capitalized tokens
foreign language terms
New Auto-Interp
Negative Logits
,
-0.55
<bos>
-0.53
.
-0.53
↵↵
-0.53
"
-0.52
:
-0.51
↵
-0.50
-
-0.50
-0.49
...
-0.49
POSITIVE LOGITS
kysy
0.35
šķ
0.30
femininas
0.30
aikaa
0.29
bēr
0.28
vandens
0.28
jäsen
0.27
vastaan
0.26
näky
0.26
päivä
0.26
Activations Density 24.943%