INDEX
Explanations
the term "hobby" and variants indicating leisure activities
New Auto-Interp
Negative Logits
ulpt
-0.16
çĭIJ
-0.15
/Graphics
-0.15
ιβ
-0.15
äl
-0.14
voks
-0.14
ázev
-0.14
èįIJ
-0.14
ħĮ
-0.14
eners
-0.14
POSITIVE LOGITS
iez
0.17
fully
0.16
egin
0.16
beer
0.16
highway
0.15
th
0.15
ites
0.15
Peng
0.15
se
0.14
idor
0.14
Activations Density 0.002%