INDEX
Explanations
the word "out" or variations of it, such as "outs" and "OUT"
variations of the word "out."
New Auto-Interp
Negative Logits
arsen
-0.90
avorite
-0.74
ajo
-0.68
anguage
-0.68
--------------------------------
-0.65
subp
-0.65
downward
-0.65
ε
-0.63
trem
-0.63
ute
-0.63
POSITIVE LOGITS
doors
1.01
lier
0.96
dated
0.94
landish
0.94
door
0.94
fitted
0.92
stretched
0.92
raged
0.89
numbered
0.89
fits
0.87
Activations Density 0.036%