INDEX
Explanations
verbs indicating actions or processes related to creation or production
New Auto-Interp
Negative Logits
/remove
-0.21
heimer
-0.19
/disable
-0.18
erk
-0.18
ings
-0.17
/use
-0.16
877
-0.16
naken
-0.15
giy
-0.15
ignon
-0.15
POSITIVE LOGITS
ly
0.27
thereby
0.22
perhaps
0.20
therein
0.19
thus
0.17
/logging
0.17
notamment
0.17
/loading
0.16
ss
0.16
/mod
0.15
Activations Density 0.314%