INDEX
Explanations
references to the word "trap" or related concepts
occurrences of the word "dri" in various contexts, especially related to names and titles
New Auto-Interp
Negative Logits
ivas
-0.86
Osw
-0.85
ikarp
-0.81
istan
-0.81
psey
-0.76
acular
-0.76
umble
-0.75
anic
-0.73
yip
-0.73
eneg
-0.72
POSITIVE LOGITS
zzy
0.78
ULTS
0.77
============
0.76
GBT
0.74
vable
0.73
zzle
0.70
abeth
0.69
zz
0.69
========
0.68
REDACTED
0.67
Activations Density 0.080%