INDEX
Explanations
references to the number six and related quantities
New Auto-Interp
Negative Logits
aren
-0.17
led
-0.16
ont
-0.16
ived
-0.16
ishments
-0.15
aira
-0.15
iams
-0.14
558
-0.14
ports
-0.14
ajan
-0.14
POSITIVE LOGITS
teenth
0.32
ties
0.28
teen
0.26
ti
0.24
ty
0.23
ãģ¤ãģ®
0.20
th
0.19
-figure
0.19
sense
0.19
ï¸ı
0.19
Activations Density 0.080%