INDEX
Explanations
repetitions of the substring "ut" and related variations
New Auto-Interp
Negative Logits
es
-0.21
ož
-0.18
e
-0.18
halt
-0.17
hawks
-0.16
522
-0.16
eur
-0.16
oÄŁlu
-0.16
o
-0.16
ingo
-0.16
POSITIVE LOGITS
ters
0.24
tle
0.24
ty
0.23
ÑĤÑĶ
0.22
ti
0.22
ritional
0.21
cheon
0.20
ted
0.20
tim
0.19
ten
0.19
Activations Density 0.039%