INDEX
Explanations
questions that begin with "which."
New Auto-Interp
Negative Logits
sson
-0.17
loff
-0.15
loid
-0.15
ict
-0.15
egg
-0.15
stuff
-0.14
erva
-0.14
ran
-0.14
ust
-0.14
udiantes
-0.14
POSITIVE LOGITS
soever
0.37
-ever
0.28
direction
0.26
ones
0.25
/how
0.24
Wich
0.23
именно
0.21
among
0.21
-direction
0.20
among
0.20
Activations Density 0.030%