INDEX
Explanations
proper nouns, specifically those related to saints and geographic locations
New Auto-Interp
Negative Logits
WriteBarrier
-0.42
ulipas
-0.42
cookieParser
-0.41
guste
-0.41
gustaría
-0.40
AndEndTag
-0.38
Diweddarwch
-0.37
listdir
-0.36
Rptr
-0.36
蝉
-0.36
POSITIVE LOGITS
Saint
0.84
Saint
0.79
Sainte
0.78
SAINT
0.78
saint
0.70
saints
0.67
Sainte
0.66
Saints
0.65
Sankt
0.65
saint
0.63
Activations Density 0.027%