INDEX
Explanations
phrases related to timing or instances of "first time" actions
references to "first time" experiences or occurrences
New Auto-Interp
Negative Logits
inth
-0.71
irs
-0.66
rosso
-0.65
Journals
-0.64
ement
-0.63
addy
-0.60
XT
-0.60
irst
-0.59
oret
-0.59
ipation
-0.58
POSITIVE LOGITS
round
0.75
encountering
0.72
someone
0.71
around
0.70
somebody
0.69
consuming
0.69
frame
0.67
earthqu
0.62
abl
0.60
bitten
0.59
Activations Density 0.052%