INDEX
Explanations
personal growth and self-acceptance
New Auto-Interp
Negative Logits
almost
0.70
several
0.62
0.62
only
0.61
consists
0.60
approximately
0.60
null
0.59
consisting
0.59
dozen
0.59
usually
0.58
POSITIVE LOGITS
togetherness
1.02
และการ
0.94
的重要性
0.88
enfance
0.86
และความ
0.85
आणि
0.82
menging
0.82
爱情
0.81
અને
0.79
heroism
0.78
Activations Density 0.004%