INDEX
Explanations
key figures and organizations associated with social or political movements
New Auto-Interp
Negative Logits
Inactive
-0.17
ulan
-0.17
olo
-0.14
orthy
-0.14
IQ
-0.13
wyn
-0.13
uv
-0.13
oz
-0.13
ỳ
-0.13
ØŃÙĬÙĨ
-0.13
POSITIVE LOGITS
instead
0.48
opposite
0.38
Instead
0.38
instead
0.37
Instead
0.37
naopak
0.37
contrary
0.28
Nope
0.26
contrario
0.25
ngược
0.25
Activations Density 0.338%