INDEX
Explanations
phrases related to personal opinions or preferences
New Auto-Interp
Negative Logits
idium
-0.70
concess
-0.70
Mandatory
-0.61
isoft
-0.60
ivari
-0.60
puff
-0.58
rolet
-0.58
ignore
-0.58
cussion
-0.57
oiler
-0.57
POSITIVE LOGITS
fault
0.80
ById
0.76
ibility
0.74
shelter
0.73
irresistible
0.72
inspiration
0.72
ered
0.71
refuge
0.71
amusement
0.71
satisfaction
0.70
Activations Density 0.063%