INDEX
Explanations
phrases starting with "one of the."
phrases that emphasize specific instances or examples of something
New Auto-Interp
Negative Logits
hya
-0.65
itures
-0.64
anse
-0.62
enery
-0.60
ikarp
-0.59
ateurs
-0.58
iture
-0.58
ynt
-0.57
berus
-0.57
alez
-0.57
POSITIVE LOGITS
arching
0.83
icial
0.77
nutshell
0.74
course
0.71
ours
0.67
course
0.66
paramount
0.65
nic
0.65
wonders
0.64
hundred
0.64
Activations Density 0.089%