INDEX
Explanations
references to specific movies, television shows, and characters
occurrences of the word "the"
New Auto-Interp
Negative Logits
fax
-0.74
âĢij
-0.73
rade
-0.71
strap
-0.70
onse
-0.70
Scotland
-0.69
Iterator
-0.68
nesty
-0.68
gpu
-0.67
Albania
-0.67
POSITIVE LOGITS
latter
1.19
aforementioned
1.17
same
1.15
earliest
1.04
latest
1.03
entirety
1.03
entire
1.02
smallest
0.99
infamous
0.98
slightest
0.97
Activations Density 1.228%