INDEX
Explanations
references to positivity or upbeat themes
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.15
tr
-0.15
eker
-0.15
inea
-0.15
pool
-0.14
acji
-0.14
Ĺi
-0.14
ispers
-0.14
DMI
-0.14
emente
-0.14
POSITIVE LOGITS
ITIVE
0.27
itivity
0.24
ITIONS
0.24
itional
0.23
izione
0.23
itive
0.22
session
0.21
idon
0.21
itives
0.21
iciones
0.21
Activations Density 0.014%