INDEX
Explanations
statements asserting or challenging claims
references to assertions or statements of belief
New Auto-Interp
Negative Logits
srf
-0.95
cffffcc
-0.87
Watching
-0.80
arrang
-0.78
electing
-0.73
simul
-0.73
newcom
-0.72
etheless
-0.71
stocking
-0.71
destro
-0.65
POSITIVE LOGITS
claims
1.10
Claim
1.03
claim
1.00
ifications
0.83
ylum
0.81
Claim
0.80
BILITY
0.80
ulent
0.79
Cheong
0.79
oux
0.79
Activations Density 0.011%