INDEX
Explanations
content related to political and social discussions
phrases related to significant political events or decisions
New Auto-Interp
Negative Logits
anwhile
-0.69
)."
-0.62
).[
-0.57
therefore
-0.55
'."
-0.53
.'"
-0.53
meanwhile
-0.52
.).
-0.49
however
-0.49
]."
-0.48
POSITIVE LOGITS
Canaver
0.55
Spoiler
0.54
FAQ
0.49
Spoiler
0.48
ensical
0.47
precon
0.46
unpre
0.46
JPM
0.45
ONY
0.44
trolling
0.44
Activations Density 4.356%