INDEX
Explanations
verbs indicating past actions or states
New Auto-Interp
Negative Logits
izens
-0.16
ÃŃrk
-0.15
rror
-0.15
Created
-0.15
bjerg
-0.15
ushi
-0.14
eniable
-0.14
Asked
-0.14
erect
-0.14
bsolute
-0.14
POSITIVE LOGITS
met
0.31
greeted
0.28
met
0.21
mar
0.21
wid
0.20
resisted
0.20
Met
0.19
accompanied
0.19
approved
0.18
widely
0.18
Activations Density 0.206%