INDEX
Explanations
repeated instances of the word "the" and track its frequency
New Auto-Interp
Negative Logits
ramework
-0.16
abant
-0.14
ramer
-0.14
resembl
-0.14
há
-0.14
Subjects
-0.14
ilim
-0.14
reater
-0.13
šit
-0.13
blat
-0.13
POSITIVE LOGITS
/to
0.20
alto
0.15
iena
0.15
usta
0.15
outset
0.15
standpoint
0.14
دÙĪØ§Ø¬
0.14
Weiner
0.14
cache
0.14
owitz
0.14
Activations Density 0.125%