INDEX
Explanations
news-related phrases or statements
instances of citations or references to sources
New Auto-Interp
Negative Logits
phony
-0.69
odox
-0.68
away
-0.65
backward
-0.62
eez
-0.62
isSpecialOrderable
-0.60
athed
-0.59
eco
-0.59
ãĥ¥
-0.59
ific
-0.59
POSITIVE LOGITS
"â̦
0.79
although
0.72
"[
0.71
citing
0.68
Kinnikuman
0.67
uthor
0.67
"(
0.66
Fib
0.66
Mehran
0.65
"...
0.64
Activations Density 0.156%