INDEX
Explanations
generic positive descriptors or greetings
greetings or expressions of pleasantry
New Auto-Interp
Negative Logits
ories
-0.81
IOC
-0.69
Frie
-0.68
olog
-0.68
oret
-0.68
obbies
-0.68
Genie
-0.67
doping
-0.65
stranger
-0.65
ophob
-0.64
POSITIVE LOGITS
ciating
0.80
nesday
0.72
psc
0.71
aturday
0.70
²¾
0.70
thora
0.69
tick
0.67
morrow
0.66
liction
0.65
sembly
0.65
Activations Density 0.000%