INDEX
Explanations
mentions of yoga and gymnastics
New Auto-Interp
Negative Logits
hyde
-0.82
vous
-0.73
etary
-0.73
strous
-0.72
*/(
-0.71
endez
-0.71
ictional
-0.70
ighty
-0.69
ilial
-0.68
Snowden
-0.66
POSITIVE LOGITS
meditation
0.99
yoga
0.98
instructor
0.96
mats
0.92
pants
0.89
oga
0.87
instructors
0.87
nas
0.86
routines
0.82
Yoga
0.82
Activations Density 0.008%