INDEX
Explanations
words related to relaxation and free time activities
terms related to leisure activities
New Auto-Interp
Negative Logits
imposed
-0.69
ars
-0.66
NAT
-0.65
inx
-0.64
gran
-0.63
wound
-0.61
DOT
-0.61
Rebellion
-0.59
antis
-0.59
ARS
-0.58
POSITIVE LOGITS
leisure
1.15
isure
1.11
trave
0.89
ende
0.83
bnb
0.83
shire
0.82
emouth
0.82
lihood
0.81
seiz
0.81
intrins
0.77
Activations Density 0.008%