INDEX
Explanations
instances of the word "more" in various contexts
New Auto-Interp
Negative Logits
ORS
-0.73
gee
-0.72
pper
-0.70
minus
-0.70
assis
-0.70
ogens
-0.70
poke
-0.69
maid
-0.68
eson
-0.68
master
-0.66
POSITIVE LOGITS
importantly
0.76
sophisticated
0.75
occurrences
0.75
frustrated
0.73
invitations
0.73
intrusive
0.71
refined
0.71
noticeable
0.71
worrisome
0.70
exciting
0.70
Activations Density 0.023%