INDEX
Explanations
references to specific instances or examples
New Auto-Interp
Negative Logits
Phry
-0.69
httphttps
-0.64
writeField
-0.61
betweenstory
-0.61
shovels
-0.60
orch
-0.58
sherds
-0.57
gddr
-0.57
subgoals
-0.56
folios
-0.56
POSITIVE LOGITS
particular
1.09
thing
0.85
particular
0.85
kind
0.81
stuff
0.79
guy
0.78
entire
0.72
daqui
0.71
wonderful
0.71
incredible
0.70
Activations Density 0.415%