INDEX
Explanations
if/when followed by a question or choice
New Auto-Interp
Negative Logits
순
0.38
veritable
0.37
بعض
0.37
인해
0.37
有一些
0.36
해당
0.35
ตั้งแต่
0.34
nogle
0.34
بشكل
0.34
jaw
0.33
POSITIVE LOGITS
selecting
0.51
asked
0.50
requesting
0.48
selecting
0.48
choosing
0.48
asked
0.47
chosen
0.46
entering
0.45
enrolled
0.44
employed
0.43
Activations Density 0.125%