INDEX
Explanations
the mention of the number "six" or related phrases
New Auto-Interp
Negative Logits
ont
-0.17
lint
-0.16
439
-0.16
Kushner
-0.16
ports
-0.15
iams
-0.15
ived
-0.15
aren
-0.15
âĶĶ
-0.15
558
-0.15
POSITIVE LOGITS
teenth
0.36
ties
0.33
teen
0.30
ti
0.28
ty
0.26
six
0.20
TY
0.19
sense
0.19
Flags
0.19
tee
0.19
Activations Density 0.105%