INDEX
Explanations
spending time learning or exploring
New Auto-Interp
Negative Logits
"!
0.47
όταν
0.45
عندما
0.43
Lors
0.40
حتى
0.40
quando
0.39
पहली
0.39
когда
0.39
cuando
0.38
образа
0.38
POSITIVE LOGITS
studying
0.60
exploring
0.59
investigating
0.53
discussing
0.52
analyzing
0.52
researching
0.51
preparing
0.49
constructing
0.48
revising
0.47
im
0.46
Activations Density 0.016%