INDEX
Explanations
exploring curiosity or interest
New Auto-Interp
Negative Logits
মোটামুটি
0.49
简直
0.43
डिग्री
0.40
waiters
0.39
peanuts
0.39
בעי
0.38
!">
0.37
delicios
0.37
specialize
0.37
downright
0.37
POSITIVE LOGITS
curiosity
0.80
寻求
0.70
관심
0.63
interesse
0.63
जिज्ञासा
0.62
관심을
0.62
好奇
0.61
seeking
0.61
seek
0.60
interest
0.59
Activations Density 0.068%