INDEX
Explanations
phrases expressing superlatives or the best among options
New Auto-Interp
Negative Logits
ertz
-0.13
/wiki
-0.13
odule
-0.13
_parms
-0.13
λη
-0.13
respective
-0.13
ãĤ¤ãĥĦ
-0.12
orie
-0.12
insky
-0.12
Af
-0.12
POSITIVE LOGITS
thing
0.35
thing
0.26
question
0.24
Thing
0.23
Thing
0.22
reason
0.21
benefit
0.19
advantage
0.19
(thing
0.18
concern
0.17
Activations Density 0.235%