INDEX
Explanations
phrases related to happiness or positive feelings
New Auto-Interp
Negative Logits
guiActiveUn
-0.63
Kuwait
-0.61
fixation
-0.60
trl
-0.60
senal
-0.59
Divinity
-0.58
VIDEOS
-0.58
Nile
-0.57
Schiff
-0.57
Engineers
-0.57
POSITIVE LOGITS
ened
1.23
ily
1.17
ening
1.13
iest
1.13
iness
1.12
ings
0.95
earance
0.94
ert
0.94
erc
0.93
ier
0.90
Activations Density 0.020%