INDEX
Explanations
expressions of appreciation and positivity
New Auto-Interp
Negative Logits
noDo
-0.60
BrowserModule
-0.48
Eminem
-0.48
myſelf
-0.48
rillation
-0.48
Battlefield
-0.47
Paglinawan
-0.47
ſelf
-0.47
₁,
-0.47
Descartes
-0.47
POSITIVE LOGITS
lovely
0.92
Lovely
0.90
lovely
0.90
Lovely
0.87
nice
0.69
Nice
0.67
nice
0.67
wonderful
0.64
Nice
0.64
Wonderful
0.59
Activations Density 0.001%