INDEX
Explanations
mentions of mock scenarios or representations
references to mockumentaries
New Auto-Interp
Negative Logits
Horizon
-0.79
cryst
-0.78
FORM
-0.72
arrang
-0.72
violet
-0.66
safegu
-0.63
hips
-0.62
ither
-0.61
vested
-0.61
è£ħ
-0.60
POSITIVE LOGITS
ument
1.11
eries
0.98
tails
0.93
ups
0.93
ingly
0.90
Mock
0.89
ery
0.88
atory
0.83
uppet
0.81
crow
0.77
Activations Density 0.033%