INDEX
Explanations
phrases indicating evidence or demonstration
phrases related to evidence and demonstration of claims or observations
New Auto-Interp
Negative Logits
ieties
-0.74
ortium
-0.72
isers
-0.71
warr
-0.67
shaw
-0.67
dunno
-0.67
uthor
-0.66
intending
-0.64
tic
-0.64
prone
-0.63
POSITIVE LOGITS
Figures
0.79
Figure
0.76
vividly
0.74
Fig
0.74
ADRA
0.71
tangible
0.70
Watts
0.68
products
0.68
Fig
0.66
graphs
0.65
Activations Density 0.304%