INDEX
Explanations
instances where "mind" or thoughts are mentioned in various contexts
New Auto-Interp
Negative Logits
irements
-0.69
hered
-0.65
ilon
-0.62
imer
-0.61
itud
-0.60
elin
-0.60
otta
-0.59
mouth
-0.59
Cod
-0.59
rollout
-0.59
POSITIVE LOGITS
anza
0.66
è£ıè
0.65
briefly
0.64
ffer
0.63
wondering
0.62
ovie
0.59
oldown
0.58
scape
0.58
chwitz
0.58
when
0.57
Activations Density 0.012%