INDEX
Explanations
the letter "T" in various contexts, indicating a focus on this specific character
New Auto-Interp
Negative Logits
ouch
-0.24
ube
-0.23
asks
-0.21
akes
-0.21
ime
-0.20
ask
-0.20
ipo
-0.20
imer
-0.19
ools
-0.19
iny
-0.18
POSITIVE LOGITS
acon
0.23
dap
0.18
juana
0.18
etz
0.17
usz
0.16
nem
0.16
olan
0.16
ilton
0.15
eg
0.15
alm
0.15
Activations Density 0.032%