INDEX
Explanations
phrases related to political topics
New Auto-Interp
Negative Logits
>>>>>>>>
-0.62
LAN
-0.61
Comb
-0.59
Tues
-0.59
aco
-0.58
Dating
-0.58
roundup
-0.57
ibal
-0.57
sorts
-0.57
Fresh
-0.56
POSITIVE LOGITS
iris
0.97
perceive
0.86
oppose
0.81
rir
0.80
partake
0.79
whom
0.78
offend
0.78
consume
0.76
"$:/
0.74
harmed
0.74
Activations Density 0.272%