INDEX
Explanations
organizations or entities that are identified as "think tanks."
terms related to political ideologies and think tanks
New Auto-Interp
Negative Logits
Ô
-0.90
theless
-0.82
Helpful
-0.81
Detected
-0.77
Output
-0.74
TPPStreamerBot
-0.69
Towards
-0.67
[+]
-0.65
Decay
-0.64
attribution
-0.64
POSITIVE LOGITS
ocrin
0.83
arist
0.76
camp
0.76
ops
0.75
agent
0.75
estate
0.73
mans
0.72
fund
0.72
inder
0.71
intern
0.71
Activations Density 0.498%