INDEX
Explanations
occurrences of the word "Col" along with related discussion points
New Auto-Interp
Negative Logits
ehler
-0.17
ÅĻiv
-0.15
zes
-0.15
esk
-0.15
ega
-0.14
ead
-0.14
egend
-0.14
idebar
-0.14
unfold
-0.14
.Ptr
-0.13
POSITIVE LOGITS
lier
0.35
leen
0.34
fax
0.31
lette
0.30
ombo
0.29
liers
0.28
burn
0.28
chester
0.27
borne
0.27
vin
0.26
Activations Density 0.010%