INDEX
Explanations
phrases indicating duration, testing, and various forms of value assessment
New Auto-Interp
Negative Logits
afil
-0.15
away
-0.14
kk
-0.14
rq
-0.14
ammed
-0.14
unci
-0.14
Salman
-0.14
aub
-0.14
ikk
-0.14
aro
-0.14
POSITIVE LOGITS
sake
0.44
purposes
0.35
reason
0.23
reasons
0.22
purpose
0.22
èĢĮ
0.21
purpose
0.20
Purpose
0.19
reason
0.19
PURPOSE
0.17
Activations Density 0.224%