INDEX
Explanations
references to specific names and titles
sequences of letters or characters indicative of specific phrases or names
New Auto-Interp
Negative Logits
raped
-0.68
acute
-0.61
Akin
-0.58
specificity
-0.57
respectively
-0.56
cause
-0.54
afterlife
-0.54
overboard
-0.52
horr
-0.52
shame
-0.52
POSITIVE LOGITS
phant
0.89
imaru
0.85
velt
0.81
tes
0.75
merce
0.74
Unix
0.72
nic
0.72
Redditor
0.72
SpaceEngineers
0.71
ois
0.70
Activations Density 0.353%