INDEX
Explanations
symbols and punctuation, particularly ones related to questions and mentions
New Auto-Interp
Negative Logits
¥ŀ
-0.90
monton
-0.79
undai
-0.77
ocr
-0.75
iscopal
-0.74
outheastern
-0.74
deen
-0.72
elaide
-0.71
iren
-0.71
opolis
-0.71
POSITIVE LOGITS
crates
0.72
Crate
0.66
Pledge
0.63
exclus
0.62
armor
0.61
Recipes
0.61
crate
0.60
discounted
0.60
Fav
0.59
Valentine
0.59
Activations Density 0.014%