INDEX
Explanations
something happened or changed
New Auto-Interp
Negative Logits
ที่คุณ
0.40
崦
0.39
पाहून
0.38
που
0.37
送到
0.36
që
0.35
Loc
0.35
имају
0.34
你能
0.34
ಕಂಡ
0.34
POSITIVE LOGITS
shifts
0.64
shifted
0.60
prevents
0.59
compels
0.58
feels
0.55
interferes
0.50
distinguishes
0.50
Shifts
0.50
inhibits
0.49
seems
0.49
Activations Density 0.005%