INDEX
Explanations
instances of the word "mention" and its variations in context
New Auto-Interp
Negative Logits
oen
-0.15
idal
-0.14
dez
-0.14
nze
-0.14
ylum
-0.14
topl
-0.13
raith
-0.13
anas
-0.13
Unhandled
-0.13
kup
-0.13
POSITIVE LOGITS
erdale
0.19
isan
0.15
ullet
0.15
ırak
0.15
isky
0.14
ipo
0.14
Soph
0.14
ecta
0.14
åIJĽ
0.14
erva
0.14
Activations Density 0.006%