INDEX
Explanations
phrases starting with "wh" followed by a space
occurrences of the substring "wh"
New Auto-Interp
Negative Logits
advance
-0.71
Stra
-0.70
interstitial
-0.68
bed
-0.68
Grande
-0.66
atory
-0.64
Erdogan
-0.63
Bach
-0.62
Manual
-0.61
Kub
-0.61
POSITIVE LOGITS
ilst
1.22
soever
1.12
istle
1.07
ispers
1.06
olly
1.04
orf
0.97
ocom
0.95
urst
0.94
atson
0.93
irlwind
0.91
Activations Density 0.005%