INDEX
Explanations
expressions that solicit feedback or input from the audience
New Auto-Interp
Negative Logits
ãģ¾ãģ¾
-0.16
rema
-0.15
Becker
-0.14
strup
-0.14
Dün
-0.14
got
-0.14
sten
-0.13
tvrt
-0.13
.rpm
-0.13
ingle
-0.13
POSITIVE LOGITS
know
0.29
hear
0.23
knows
0.20
known
0.17
aware
0.16
know
0.16
hearing
0.16
savoir
0.16
what
0.16
Know
0.15
Activations Density 0.020%