INDEX
Explanations
multiple instances of the word "return."
New Auto-Interp
Negative Logits
ussen
-0.86
Parenthood
-0.81
Cola
-0.71
esson
-0.69
vern
-0.66
aughtered
-0.66
creen
-0.66
essen
-0.64
disse
-0.63
rotein
-0.61
POSITIVE LOGITS
ees
1.17
home
0.80
home
0.76
able
0.76
postage
0.73
ee
0.71
porting
0.69
prise
0.69
ance
0.68
ported
0.68
Activations Density 0.017%