INDEX
Explanations
significant nouns and verbs related to objectives and actions
New Auto-Interp
Negative Logits
aram
-0.16
thiên
-0.15
abox
-0.14
/inet
-0.14
ausal
-0.14
oste
-0.14
ANI
-0.14
ects
-0.13
extract
-0.13
ervers
-0.13
POSITIVE LOGITS
dafür
0.20
dazu
0.19
åħ·ä½ĵ
0.16
ulla
0.15
زÙĨ
0.14
ettes
0.14
Wy
0.14
attendant
0.14
519
0.14
reversal
0.13
Activations Density 0.026%