INDEX
Explanations
phrases indicating a transition in a text
punctuation that indicates the end of a sentence
New Auto-Interp
Negative Logits
Lew
-0.84
Flavoring
-0.81
bis
-0.76
iddles
-0.72
*)
-0.72
ARM
-0.71
ARS
-0.70
VL
-0.70
ĪĴ
-0.70
spir
-0.70
POSITIVE LOGITS
day
1.12
Day
0.75
DAY
0.75
stride
0.73
time
0.72
enegger
0.72
day
0.71
benchmark
0.68
audiences
0.66
peed
0.66
Activations Density 0.000%