INDEX
Explanations
words signaling comparison or potential actions
instances of lists or enumerations
New Auto-Interp
Negative Logits
ety
-0.78
iple
-0.69
cember
-0.65
herent
-0.62
lore
-0.61
uscript
-0.59
usra
-0.59
lord
-0.59
vre
-0.58
nutshell
-0.58
POSITIVE LOGITS
regardless
1.21
lest
1.19
preferably
1.18
irrespective
1.08
thereby
1.04
albeit
1.03
however
0.98
whereas
0.93
unless
0.93
depending
0.93
Activations Density 0.584%