INDEX
Explanations
phrases or words that start with 'wh'
instances of the word "wh," indicating a focus on questions or questioning phrases
New Auto-Interp
Negative Logits
mosaic
-0.65
silenced
-0.63
enclosed
-0.63
condolences
-0.62
marking
-0.61
dividing
-0.61
expressing
-0.61
starters
-0.60
issuer
-0.59
condu
-0.59
POSITIVE LOGITS
omever
1.69
irling
1.46
ipl
1.30
izz
1.29
arf
1.28
irl
1.27
ittle
1.27
olen
1.26
acky
1.26
itt
1.23
Activations Density 0.008%