INDEX
Explanations
phrases that indicate a call to action, particularly links for more information or resources
New Auto-Interp
Negative Logits
apult
-0.17
enville
-0.15
iffin
-0.15
ιά
-0.14
uitka
-0.14
便
-0.14
ÑĮко
-0.14
mentation
-0.14
celik
-0.14
suf
-0.13
POSITIVE LOGITS
fore
0.17
/tasks
0.14
GRADE
0.14
ford
0.14
fore
0.14
å¼
0.14
arah
0.14
稿
0.14
udd
0.13
affen
0.13
Activations Density 0.009%