INDEX
Explanations
the presence of a specific token structure, primarily a recurring pattern in the text
New Auto-Interp
Negative Logits
Kislyak
-0.70
verett
-0.67
mble
-0.63
andum
-0.63
Greenwald
-0.62
Qiao
-0.62
aukee
-0.62
Canaver
-0.61
Cheong
-0.59
TIME
-0.58
POSITIVE LOGITS
ouse
1.10
ulhu
0.93
orse
0.88
rift
0.87
ttp
0.86
iop
0.85
some
0.85
orne
0.83
orst
0.82
shire
0.82
Activations Density 0.005%