INDEX
Explanations
questions about interest or desire
New Auto-Interp
Negative Logits
Yep
0.74
variable
0.72
Yep
0.72
podremos
0.71
बताऊंगा
0.71
rž
0.70
yep
0.70
জানে
0.70
competente
0.70
können
0.69
POSITIVE LOGITS
bothers
1.73
excites
1.65
bothering
1.65
inspires
1.65
interests
1.47
fascin
1.44
feels
1.42
appeals
1.40
bother
1.35
inspire
1.35
Activations Density 0.947%