INDEX
Explanations
references to political threats and security concerns
New Auto-Interp
Negative Logits
pper
-0.16
xygen
-0.15
antz
-0.15
ÐĴС
-0.14
αÏģά
-0.14
Thunk
-0.14
.Prot
-0.13
ervas
-0.13
Ù쨶
-0.13
assi
-0.13
POSITIVE LOGITS
pose
0.87
poses
0.80
posing
0.70
Pose
0.70
posed
0.70
pose
0.66
Pose
0.64
poses
0.62
presents
0.62
presenting
0.53
Activations Density 0.054%