INDEX
Explanations
phrases related to asking for or giving explanations or instructions
expressions indicating uncertainty or questions about value or worth
New Auto-Interp
Negative Logits
spir
-0.65
isp
-0.64
native
-0.62
ccoli
-0.62
glyph
-0.59
£ı
-0.59
undert
-0.59
migr
-0.58
Janeiro
-0.57
footh
-0.57
POSITIVE LOGITS
Answer
0.98
Nope
0.92
?????-?????-
0.87
Answer
0.83
.?
0.82
answered
0.71
Reference
0.68
?,
0.67
voucher
0.67
Ü
0.67
Activations Density 0.183%