INDEX
Explanations
phrases or words containing "wh"
the occurrence of the substring "wh" in various contexts
New Auto-Interp
Negative Logits
mosaic
-0.70
PORT
-0.61
Prin
-0.58
punishable
-0.58
theatre
-0.58
heal
-0.57
theater
-0.57
enclosed
-0.57
Grande
-0.57
attackers
-0.57
POSITIVE LOGITS
omever
1.44
irling
1.38
ipl
1.22
arf
1.22
olen
1.21
irl
1.21
istle
1.19
izz
1.19
itt
1.15
acky
1.12
Activations Density 0.009%