INDEX
Explanations
phrases related to asking questions or seeking information
references to the concept of "what" in various contexts
New Auto-Interp
Negative Logits
enburg
-0.84
robe
-0.73
gur
-0.73
gi
-0.68
xon
-0.67
kj
-0.66
odge
-0.60
uge
-0.60
arella
-0.60
favor
-0.58
POSITIVE LOGITS
transpired
1.39
happens
1.34
happened
1.30
constitutes
1.21
soever
1.09
awaits
0.99
separates
0.97
happ
0.97
exactly
0.96
constituted
0.94
Activations Density 0.107%