INDEX
Explanations
government-related text, including mentions of government agencies, applications developed by government agencies, government portals, and administrative processes
New Auto-Interp
Negative Logits
forcer
-0.59
¯
-0.58
Darius
-0.57
affer
-0.55
aurus
-0.54
Secondly
-0.52
anny
-0.52
fourth
-0.51
Trailer
-0.50
Adin
-0.50
POSITIVE LOGITS
themselves
0.81
careers
0.75
varying
0.72
respective
0.70
variants
0.68
osponsors
0.68
specialize
0.68
histories
0.66
surn
0.66
theirs
0.64
Activations Density 0.798%