INDEX
Explanations
Twitter links, specifically ones ending with ".com" and containing images
punctuation symbols, particularly periods
New Auto-Interp
Negative Logits
conclud
-0.79
perspect
-0.79
behavi
-0.78
challeng
-0.75
involuntary
-0.74
volunte
-0.71
retrospective
-0.71
untarily
-0.71
mosqu
-0.70
fountain
-0.70
POSITIVE LOGITS
1.09
1.07
1.06
com
1.02
github
1.02
cdn
0.99
gov
0.98
wik
0.97
nz
0.95
mk
0.95
Activations Density 0.171%