INDEX
Explanations
highly impactful statements related to government, politics, and international affairs
New Auto-Interp
Negative Logits
poons
-0.86
poon
-0.74
ayson
-0.72
aukee
-0.71
APR
-0.71
APPLIC
-0.69
Carbuncle
-0.69
veyard
-0.67
Thurs
-0.66
jen
-0.66
POSITIVE LOGITS
hood
0.99
less
0.95
rooms
0.94
lessness
0.91
legislatures
0.91
wide
0.87
governments
0.86
apparatus
0.85
room
0.83
repression
0.82
Activations Density 0.048%