INDEX
Explanations
following introductory phrases
New Auto-Interp
Negative Logits
ถ้า
0.89
lots
0.77
ถ้า
0.77
usually
0.73
خیلی
0.73
trying
0.73
sometimes
0.72
אבל
0.71
kalau
0.71
things
0.70
POSITIVE LOGITS
Through
0.96
Through
0.88
Specifically
0.88
Utilizing
0.86
Specifically
0.84
Currently
0.82
Notably
0.80
Following
0.79
Leveraging
0.76
Furthermore
0.76
Activations Density 0.064%