INDEX
Explanations
phrases that express critiques or discussions about government and society
New Auto-Interp
Negative Logits
eya
-0.18
rush
-0.16
rega
-0.15
berapa
-0.15
reck
-0.15
reckon
-0.14
onces
-0.14
World
-0.14
bara
-0.14
prez
-0.13
POSITIVE LOGITS
esson
0.16
semiclass
0.15
nackte
0.15
åĨĨ
0.15
grav
0.15
steller
0.14
citiz
0.14
ÙĤد
0.14
ovat
0.14
-unstyled
0.13
Activations Density 0.018%