INDEX
Explanations
phrases related to news articles, investigations, and official reports
punctuations indicating the end of sentences or statements
New Auto-Interp
Negative Logits
pudding
-0.76
fists
-0.74
extinct
-0.73
imperson
-0.72
backward
-0.72
impression
-0.71
tiger
-0.70
hands
-0.70
utter
-0.69
purse
-0.69
POSITIVE LOGITS
sbm
0.95
Similarly
0.93
Additionally
0.91
Flavoring
0.90
Previously
0.89
Specifically
0.88
Furthermore
0.88
Along
0.88
Also
0.87
Moreover
0.87
Activations Density 0.253%