INDEX
Explanations
information related to controversial topics and debates
assertions or beliefs about significant historical or political events
New Auto-Interp
Negative Logits
summarizes
-0.56
cellaneous
-0.52
swick
-0.52
bilt
-0.51
partName
-0.49
wrapper
-0.49
ãĤ´ãĥ³
-0.48
inis
-0.48
adel
-0.47
ogether
-0.46
POSITIVE LOGITS
unfairly
0.65
unconstitutional
0.62
imminent
0.60
beneficial
0.59
racist
0.59
"))
0.57
harmful
0.57
plagiar
0.55
sexist
0.55
pedoph
0.55
Activations Density 1.486%