INDEX
Explanations
dates, locations, and names of people
parentheses or closing punctuation marks in the document
New Auto-Interp
Negative Logits
frog
-0.67
products
-0.63
users
-0.62
sembly
-0.61
cloves
-0.61
Ń·
-0.60
domestically
-0.60
sails
-0.60
burner
-0.59
spam
-0.59
POSITIVE LOGITS
Actor
0.76
Madison
0.74
Associated
0.72
ILCS
0.71
Original
0.71
htaking
0.69
Correct
0.69
TAG
0.68
Letter
0.67
Ev
0.67
Activations Density 0.096%