INDEX
Explanations
adverbs indicating time (e.g., "just", "now", "often")
adverbs indicating time or frequency
New Auto-Interp
Negative Logits
successor
-0.68
Mats
-0.66
Subject
-0.64
chants
-0.60
omn
-0.59
++++
-0.57
jokes
-0.55
Subject
-0.54
prone
-0.54
hander
-0.53
POSITIVE LOGITS
been
1.49
been
1.15
undergone
0.99
Been
0.92
gone
0.91
fallen
0.86
gotten
0.86
gotten
0.85
become
0.83
gone
0.82
Activations Density 0.169%