INDEX
Explanations
the beginning of a document or text, indicating the start of a significant section
New Auto-Interp
Negative Logits
titleMargin
-1.16
featureID
-1.07
Personendaten
-1.05
webElementXpaths
-1.04
NameInMap
-1.01
Walkover
-1.00
parsedMessage
-1.00
ItemBackground
-0.98
Vidite
-0.98
kháu
-0.98
POSITIVE LOGITS
гова
0.43
↵
0.42
[toxicity=0]
0.39
Smyth
0.39
↵↵
0.37
#
0.37
0.37
Dan
0.36
he
0.36
人都
0.36
Activations Density 0.889%