INDEX
Explanations
words related to espionage or spy activities
New Auto-Interp
Negative Logits
urses
-0.79
ickr
-0.77
ktop
-0.76
aii
-0.72
keye
-0.71
artney
-0.69
TAIN
-0.69
agne
-0.68
ourse
-0.65
tin
-0.65
POSITIVE LOGITS
moon
1.03
loo
0.95
boxing
0.94
runners
0.90
fax
0.83
stats
0.82
flame
0.81
hun
0.77
shadow
0.75
runner
0.74
Activations Density 0.016%