INDEX
Explanations
phrases related to enjoyment and participation in activities
New Auto-Interp
Negative Logits
fade
-0.15
Favor
-0.14
idelity
-0.14
880
-0.13
ftar
-0.13
isc
-0.13
ÎķΤ
-0.13
ÑĨвеÑĤ
-0.13
504
-0.13
oud
-0.13
POSITIVE LOGITS
fun
0.84
fun
0.68
Fun
0.68
FUN
0.66
Fun
0.64
FUN
0.58
.fun
0.56
_fun
0.55
fun
0.54
(fun
0.51
Activations Density 0.139%