INDEX
Explanations
phrases indicating ability or potential actions
New Auto-Interp
Negative Logits
ils
-0.17
preliminary
-0.17
antar
-0.16
allas
-0.15
Assertions
-0.15
illi
-0.14
Prel
-0.14
VRT
-0.14
monds
-0.14
agar
-0.13
POSITIVE LOGITS
ableObject
0.17
ombat
0.16
upal
0.16
.scalablytyped
0.16
HAM
0.15
addCriterion
0.15
ynn
0.14
ingu
0.14
molec
0.14
arella
0.14
Activations Density 0.090%