INDEX
Explanations
timestamps
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
UCT
-0.71
execut
-0.64
Maw
-0.58
enegger
-0.58
anmar
-0.58
persecuted
-0.57
longevity
-0.57
fors
-0.56
spo
-0.55
conqu
-0.55
POSITIVE LOGITS
Downloadha
0.87
0.77
gran
0.71
css
0.70
EDT
0.68
meter
0.68
1500
0.66
route
0.65
req
0.63
org
0.62
Activations Density 0.024%