INDEX
Explanations
terms related to health and wellness, particularly in the context of adolescents and diet
New Auto-Interp
Negative Logits
كومونز
-1.07
<bos>
-0.99
propOrder
-0.98
AsUp
-0.94
WriteBarrier
-0.92
للمعارف
-0.92
Baillargeon
-0.90
Efq
-0.89
مرئيه
-0.89
itſelf
-0.89
POSITIVE LOGITS
↵↵
0.91
I
0.59
↵
0.57
<eos>
0.55
T
0.54
0.54
.
0.52
The
0.52
0.50
↵↵↵
0.49
Activations Density 0.596%