INDEX
Explanations
emphasis or curiosity expressed through the word "what"
interrogative phrases or words
New Auto-Interp
Negative Logits
uling
-0.68
psc
-0.66
uating
-0.63
ean
-0.62
ulkan
-0.61
istani
-0.61
Ń·
-0.60
mens
-0.59
atis
-0.59
POR
-0.59
POSITIVE LOGITS
soever
1.35
else
1.01
happened
0.99
ensued
0.95
happens
0.94
transpired
0.91
better
0.89
follows
0.81
?!
0.80
separates
0.80
Activations Density 0.076%