INDEX
Explanations
instances where someone is speaking or quoting something
references to negative speech or statements made about others
New Auto-Interp
Negative Logits
Flickr
-0.79
Flickr
-0.70
edia
-0.62
Wikimedia
-0.62
ablished
-0.62
Mechanical
-0.61
lisher
-0.61
Flavoring
-0.60
ibal
-0.59
anding
-0.59
POSITIVE LOGITS
aloud
1.65
loudly
1.31
goodbye
1.23
loud
1.19
louder
1.04
words
0.96
phrases
0.93
word
0.92
prayers
0.92
word
0.92
Activations Density 0.197%