INDEX
Explanations
sequences related to questioning quantities and aspects
New Auto-Interp
Negative Logits
kus
-0.74
ixel
-0.72
table
-0.71
rall
-0.69
rites
-0.66
thal
-0.65
qt
-0.64
porate
-0.63
ipment
-0.63
riz
-0.62
POSITIVE LOGITS
anymore
0.80
misunder
0.79
else
0.78
coincidence
0.77
anybody
0.76
explan
0.72
consolation
0.71
irony
0.65
exactly
0.65
anyone
0.64
Activations Density 0.202%