INDEX
Explanations
keywords relating to factual information and belief statements
New Auto-Interp
Negative Logits
Reef
-0.65
Weston
-0.64
KS
-0.59
Watkins
-0.57
Allied
-0.56
Kau
-0.55
reprinted
-0.54
Fernandez
-0.54
Flore
-0.53
Cath
-0.52
POSITIVE LOGITS
usterity
1.01
theless
0.99
\)
0.96
estine
0.95
'?
0.93
»
0.93
É
0.88
%"
0.88
terday
0.87
/,
0.86
Activations Density 0.454%