INDEX
Explanations
occurrences of the token "<bos>", signaling the beginning of new sections or paragraphs
New Auto-Interp
Negative Logits
TagMode
-0.64
Rhestr
-0.56
Nestor
-0.54
itſelf
-0.53
🟤
-0.52
antidesliz
-0.50
goddesses
-0.50
yourself
-0.49
yves
-0.49
chargez
-0.48
POSITIVE LOGITS
IsContent
0.88
Personendaten
0.80
bezeichneter
0.78
WebVitals
0.70
utafitiHapana
0.69
èdia
0.67
writeField
0.65
CanadaChoose
0.63
NameInMap
0.62
ویکیپدی
0.62
Activations Density 0.913%