INDEX
Explanations
don't worry, forget, hesitate
New Auto-Interp
Negative Logits
ب
0.43
Prof
0.41
In
0.40
Whenever
0.39
वी
0.38
ב
0.38
語
0.38
ஜோ
0.37
ลอง
0.37
employs
0.36
POSITIVE LOGITS
worry
0.64
forget
0.55
khawatir
0.48
shy
0.47
Worry
0.47
hesitate
0.47
forget
0.44
worrying
0.43
underestimate
0.43
mistake
0.42
Activations Density 0.014%