INDEX
Explanations
the presence of the letter "L" in various contexts or combinations
New Auto-Interp
Negative Logits
ane
-0.19
ine
-0.17
ink
-0.17
ace
-0.17
aus
-0.17
Swinger
-0.17
SB
-0.16
im
-0.16
asty
-0.15
IB
-0.15
POSITIVE LOGITS
alu
0.23
ighth
0.19
iveness
0.19
oris
0.19
ateral
0.18
otta
0.17
apsed
0.17
oka
0.17
erne
0.17
ulu
0.17
Activations Density 0.098%