INDEX
Explanations
anticipating questions or objections
New Auto-Interp
Negative Logits
д
1.21
с
1.12
ти
1.05
د
0.98
ime
0.91
ier
0.90
nesses
0.90
ک
0.89
iden
0.88
या
0.87
POSITIVE LOGITS
。
1.17
p
1.16
0
1.16
l
1.15
w
1.13
៩
1.13
FOR
1.12
3
1.11
צ
1.10
៣
1.10
Activations Density 0.005%