INDEX
Explanations
controversial or critical statements
punctuation and formatting-related elements in the text
New Auto-Interp
Negative Logits
adden
-0.60
tones
-0.55
boro
-0.54
heit
-0.49
ici
-0.48
tongue
-0.48
ildo
-0.48
inclusion
-0.47
packages
-0.47
ell
-0.47
POSITIVE LOGITS
Nevertheless
0.70
Nonetheless
0.69
âĢ¢âĢ¢
0.67
Meanwhile
0.67
Still
0.66
Finally
0.65
Likewise
0.64
Attempts
0.64
Similarly
0.63
ccording
0.62
Activations Density 0.980%