INDEX
Explanations
titles of articles or pieces of writing
references to publications and their titles
New Auto-Interp
Negative Logits
unts
-0.66
ordinate
-0.64
trophy
-0.63
handc
-0.62
activation
-0.61
stret
-0.61
ambul
-0.60
zers
-0.59
perimeter
-0.59
standby
-0.59
POSITIVE LOGITS
excerpts
0.98
essays
0.93
published
0.87
blogs
0.82
blogs
0.82
satirical
0.81
plagiar
0.80
articles
0.79
blog
0.78
commentary
0.78
Activations Density 0.574%