INDEX
Explanations
phrases indicating permission or invitation
phrases encouraging participation or expression of opinions
New Auto-Interp
Negative Logits
dust
-0.62
effected
-0.60
issance
-0.59
Chip
-0.58
ilater
-0.58
ilogy
-0.58
lines
-0.57
memory
-0.56
abal
-0.56
emet
-0.56
POSITIVE LOGITS
zee
0.75
zing
0.71
nels
0.69
zers
0.68
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.66
bies
0.65
Apply
0.65
ptin
0.61
éĹĺ
0.61
:)
0.60
Activations Density 0.021%