INDEX
Explanations
events related to someone achieving something for the first time
phrases indicating significant first achievements
New Auto-Interp
Negative Logits
bara
-0.97
itself
-0.61
Else
-0.61
mosp
-0.60
ribed
-0.59
facts
-0.59
urus
-0.59
models
-0.59
Mub
-0.59
Sorce
-0.58
POSITIVE LOGITS
foray
1.04
Flavoring
1.03
consecutive
0.99
ever
0.97
anniversary
0.87
outing
0.85
birthday
0.84
Ever
0.84
appearance
0.77
EVER
0.77
Activations Density 0.076%