INDEX
Explanations
he, she, names, followed by action or feeling
New Auto-Interp
Negative Logits
Apparently
0.93
apparently
0.93
apparently
0.89
seeming
0.89
Apparently
0.87
apparent
0.86
meille
0.79
我们也
0.77
居然
0.76
craziness
0.76
POSITIVE LOGITS
knew
1.20
feel
1.19
feels
1.18
watched
1.14
knows
1.13
জানে
1.11
feel
1.07
imagines
1.05
感觉到
1.02
siente
1.01
Activations Density 0.101%