INDEX
Explanations
references to leisure time or free time activities
New Auto-Interp
Negative Logits
jde
-0.16
emouth
-0.15
antha
-0.15
UST
-0.15
bote
-0.15
bak
-0.15
intern
-0.14
_tac
-0.14
eria
-0.14
pedia
-0.14
POSITIVE LOGITS
com
0.15
füg
0.14
ptime
0.14
λεÏį
0.14
Scri
0.14
Schultz
0.14
ãģ£ãģ
0.14
scal
0.13
pi
0.13
border
0.13
Activations Density 0.008%