INDEX
Explanations
expressions related to personal interests and passions
New Auto-Interp
Negative Logits
AxisAlignment
-0.55
שוליים
-0.54
становника
-0.53
ImGui
-0.52
Hentet
-0.51
ivelany
-0.49
protoc
-0.49
twimg
-0.49
الدراسه
-0.49
ImGui
-0.48
POSITIVE LOGITS
passion
2.19
love
1.82
passion
1.80
passione
1.73
Passion
1.68
pasión
1.67
Passion
1.65
paixão
1.65
passionate
1.64
passions
1.62
Activations Density 0.275%