INDEX
Explanations
expressions related to enjoyment and positive experiences
New Auto-Interp
Negative Logits
asio
-0.15
лоп
-0.14
orde
-0.14
/DD
-0.14
ernote
-0.14
mesinin
-0.14
/Dk
-0.13
Boeh
-0.13
echan
-0.13
tesy
-0.13
POSITIVE LOGITS
cool
1.06
Cool
0.90
cool
0.86
Cool
0.85
coolest
0.68
cooler
0.68
neat
0.63
cooled
0.52
Cooler
0.52
cooling
0.51
Activations Density 0.475%