INDEX
Explanations
mentions of legal or criminal activities
New Auto-Interp
Negative Logits
.","
-0.57
."
-0.56
+.
-0.56
.</
-0.55
$.
-0.53
+(
-0.52
*.
-0.51
milo
-0.51
..."
-0.50
[(
-0.50
POSITIVE LOGITS
meanwhile
0.58
odore
0.55
resa
0.54
Canaver
0.52
GOODMAN
0.50
Chomsky
0.45
irony
0.45
transcript
0.45
HuffPost
0.44
Hopkins
0.44
Activations Density 1.732%