INDEX
Explanations
modal verbs followed by outcomes
New Auto-Interp
Negative Logits
används
0.31
Represents
0.30
belongs
0.29
illuminates
0.29
embodies
0.28
represents
0.28
Provides
0.28
provides
0.27
represents
0.27
occupies
0.26
POSITIVE LOGITS
σημα
0.26
오히려
0.26
resulted
0.24
ἧ
0.23
incre
0.23
नेक
0.23
詈
0.22
使其
0.22
ctors
0.22
نتیجه
0.21
Activations Density 0.019%