INDEX
Explanations
asking for help or information
New Auto-Interp
Negative Logits
しっかり
0.38
しっかり
0.36
жет
0.36
неравен
0.35
seizing
0.35
attacking
0.35
الحره
0.34
用心
0.34
नक्की
0.34
悪
0.34
POSITIVE LOGITS
assistance
1.38
help
1.23
aiuto
1.19
assistance
1.11
hjälp
1.03
помощь
1.02
bantuan
1.02
Hilfe
1.00
Assistance
0.98
помощи
0.98
Activations Density 0.026%