INDEX
Explanations
keywords related to misinformation or inaccuracies
instances of high numerical values or scores in the context of action or compliance
New Auto-Interp
Negative Logits
endeavour
-0.67
utical
-0.66
igi
-0.63
equip
-0.62
earch
-0.61
assignment
-0.61
undermin
-0.61
Shinra
-0.58
Reincarnated
-0.58
deceived
-0.58
POSITIVE LOGITS
³³³³
0.99
³³³³³³³³³³³³³³³³
0.92
³³³³³³³³
0.86
SPONSORED
0.85
³³³
0.85
Newsletter
0.83
³³
0.81
Commercial
0.80
Writing
0.77
Posted
0.76
Activations Density 1.001%