INDEX
Explanations
phrases related to personal experiences or sentiments
references to musical albums and songs
New Auto-Interp
Negative Logits
immersion
-0.86
unnecessarily
-0.84
anx
-0.76
barriers
-0.76
redundancy
-0.76
deleg
-0.75
disadvantages
-0.75
breaching
-0.74
deterrence
-0.73
caution
-0.73
POSITIVE LOGITS
Lonely
1.26
Gone
1.19
Wrong
1.17
Tonight
1.16
Like
1.15
Alone
1.14
Called
1.12
Changed
1.12
Again
1.12
Alright
1.11
Activations Density 0.207%