INDEX
Explanations
expressions of enjoyment or fun
New Auto-Interp
Negative Logits
Награды
-0.67
littéraire
-0.63
것이다
-0.63
zeichnete
-0.56
gesetzt
-0.56
pouce
-0.55
uttered
-0.55
Viited
-0.55
silence
-0.54
ModelExpression
-0.54
POSITIVE LOGITS
fun
4.77
fun
3.87
Fun
3.74
Fun
3.73
FUN
3.49
FUN
3.17
divertido
2.34
diversión
2.10
Spaß
2.08
enjoyable
1.97
Activations Density 0.030%