INDEX
Explanations
technical terms or acronyms
instances of the word "for" in various contexts
New Auto-Interp
Negative Logits
inese
-0.68
illin
-0.67
jaws
-0.64
Tube
-0.58
âĶ
-0.56
erous
-0.56
vom
-0.56
marine
-0.56
beware
-0.55
ttle
-0.55
POSITIVE LOGITS
bidden
1.51
gotten
1.50
ced
1.04
example
1.04
WARD
1.02
cing
1.01
instance
1.01
give
0.99
gery
0.99
bid
0.97
Activations Density 0.095%