INDEX
Explanations
divisive and controversial topics
New Auto-Interp
Negative Logits
appBar
0.71
बलात्कार
0.66
стомато
0.63
শাস্তি
0.63
MyAdmin
0.62
steals
0.61
steal
0.61
꿇
0.59
Buildable
0.59
বিদ্যা
0.59
POSITIVE LOGITS
polarization
2.34
polarized
2.30
polarisation
2.11
polarised
2.10
polar
2.10
divides
2.08
polarizing
2.04
divisions
2.04
divide
2.01
Polarization
1.98
Activations Density 0.245%