INDEX
Explanations
punctuation marks that indicate surprise or disbelief
New Auto-Interp
Negative Logits
?!
-1.07
?!?!
-0.82
?!"
-0.82
!?
-0.81
?!”
-0.77
?!
-0.72
?!?
-0.71
!?!
-0.71
?!!
-0.67
!?"
-0.63
POSITIVE LOGITS
__).
0.48
twimg
0.43
]')
0.40
})));
0.40
--}}
0.40
UniformLocation
0.40
]%
0.39
--}}
0.39
""));
0.38
]</
0.37
Activations Density 0.004%