INDEX
Explanations
phrases encouraging experimentation or attempts at new activities
New Auto-Interp
Negative Logits
attempt
-0.85
attempts
-0.84
Attempt
-0.81
Olsen
-0.77
attempting
-0.77
Mocking
-0.76
مراجع
-0.75
attempts
-0.75
RenderAtEndOf
-0.72
Attempts
-0.72
POSITIVE LOGITS
wamy
0.60
Trasp
0.59
fly
0.58
homonymie
0.57
propOrder
0.57
ority
0.54
ilkan
0.54
iteracy
0.54
wir
0.54
featureID
0.54
Activations Density 0.028%