INDEX
Explanations
phrases that begin with "What's" or similar constructions
repeated instances of the word "what."
New Auto-Interp
Negative Logits
horizont
-0.78
seiz
-0.70
zn
-0.66
Buyable
-0.63
indoor
-0.62
Tid
-0.62
Pact
-0.61
Tome
-0.60
ortium
-0.60
Shap
-0.59
POSITIVE LOGITS
¬
0.95
¡
0.90
ı
0.90
¢
0.86
º
0.84
Ķ
0.84
į
0.83
else
0.82
IJ
0.82
ª
0.79
Activations Density 0.093%