INDEX
Explanations
phrases related to politics and government
references to specific influential entities or individuals in various contexts
New Auto-Interp
Negative Logits
raved
-0.64
Written
-0.60
GI
-0.59
angering
-0.59
ishable
-0.59
Writer
-0.59
igrants
-0.58
versible
-0.57
Written
-0.57
Isles
-0.57
POSITIVE LOGITS
incidentally
0.73
notoriously
0.69
famously
0.69
moreover
0.67
adj
0.65
âķIJ
0.65
basically
0.63
notor
0.62
itself
0.62
however
0.62
Activations Density 0.926%