INDEX
Explanations
the word "loving" or words associated with love and affection
references to the word "love."
New Auto-Interp
Negative Logits
Cardinal
-0.70
bailout
-0.69
testament
-0.66
standard
-0.65
opsy
-0.65
GC
-0.63
Racial
-0.63
Billboard
-0.63
Penny
-0.63
Crystal
-0.62
POSITIVE LOGITS
lov
5.20
liv
1.65
lav
1.39
hov
1.18
lain
1.17
akov
1.17
lik
1.13
д
1.11
lr
0.98
ov
0.98
Activations Density 0.027%