INDEX
Explanations
phrases related to personal decisions or choices
New Auto-Interp
Negative Logits
ATURES
-0.85
iffs
-0.75
casters
-0.74
marks
-0.72
stars
-0.71
reports
-0.71
ometers
-0.69
killers
-0.69
Americans
-0.68
avers
-0.68
POSITIVE LOGITS
nutshell
1.40
hurry
1.19
manner
1.11
vein
1.02
crowded
0.99
classroom
0.94
timely
0.93
vain
0.92
heartbeat
0.92
vacuum
0.90
Activations Density 0.154%