INDEX
Explanations
specific subject and its state
New Auto-Interp
Negative Logits
feeling
0.64
curious
0.57
impactful
0.57
bringing
0.57
optimistic
0.54
wondering
0.53
tweak
0.52
picking
0.51
convince
0.51
factoring
0.50
POSITIVE LOGITS
は
0.88
是
0.76
は
0.75
는
0.69
is
0.69
은
0.66
was
0.65
may
0.64
wird
0.64
must
0.63
Activations Density 0.320%