INDEX
Explanations
phrases starting with "What's" or "What is."
the word "what."
New Auto-Interp
Negative Logits
Lauder
-0.70
resorts
-0.64
Ivy
-0.59
sie
-0.58
segments
-0.57
Zion
-0.57
snail
-0.56
Hug
-0.55
trusts
-0.53
continents
-0.53
POSITIVE LOGITS
happening
0.93
gonna
0.93
omething
0.92
transpired
0.87
happened
0.85
wered
0.81
happ
0.80
peed
0.79
ensibly
0.79
arthed
0.75
Activations Density 0.032%