INDEX
Explanations
expressions of happiness or well-being
New Auto-Interp
Negative Logits
chers
-0.17
pc
-0.16
antro
-0.16
izzo
-0.16
ersh
-0.15
plevel
-0.14
presso
-0.14
sj
-0.14
_OBS
-0.14
olls
-0.14
POSITIVE LOGITS
-go
0.37
camper
0.27
endings
0.25
ending
0.25
Ending
0.24
-medium
0.24
hour
0.23
Hour
0.23
camp
0.22
Camp
0.22
Activations Density 0.026%