INDEX
Explanations
phrases or terms related to first occurrences or initial experiences
New Auto-Interp
Negative Logits
ixa
-0.18
first
-0.18
ixo
-0.17
rzy
-0.16
further
-0.16
forth
-0.15
firstly
-0.15
essler
-0.15
cken
-0.15
dal
-0.15
POSITIVE LOGITS
-ever
0.38
s
0.34
tiên
0.30
-hand
0.29
-rate
0.29
born
0.28
timers
0.25
-order
0.25
responders
0.24
-time
0.24
Activations Density 0.130%