INDEX
Explanations
phrases starting with "After"
transitions or continuations in narrative context
New Auto-Interp
Negative Logits
JV
-0.72
NRS
-0.70
DN
-0.70
Especially
-0.69
女
-0.69
constitu
-0.68
IZE
-0.67
uci
-0.66
çī
-0.62
NR
-0.62
POSITIVE LOGITS
wards
1.44
noon
1.38
ward
1.37
math
1.24
words
1.06
awhile
1.02
word
1.01
careful
0.98
graduating
0.96
inspecting
0.96
Activations Density 0.070%