INDEX
Explanations
expressions of confusion or requests for help
New Auto-Interp
Negative Logits
enia
-0.15
ActionTypes
-0.14
angu
-0.14
umo
-0.14
ikip
-0.14
sell
-0.14
Congratulations
-0.13
stav
-0.13
ese
-0.13
907
-0.13
POSITIVE LOGITS
help
0.25
assistance
0.23
Help
0.23
Hilfe
0.21
-help
0.21
HELP
0.20
Assistance
0.20
Help
0.19
appreciated
0.19
appreciate
0.19
Activations Density 0.066%