INDEX
Explanations
the word "this" within the context of a comparison or explanation
repetitive mentions of the word "this."
New Auto-Interp
Negative Logits
istries
-0.80
omo
-0.74
anamo
-0.73
amia
-0.71
assian
-0.71
ARS
-0.70
isms
-0.69
pots
-0.69
endez
-0.69
agi
-0.69
POSITIVE LOGITS
trope
0.98
particular
0.93
arrangement
0.86
incarnation
0.86
latest
0.85
iteration
0.85
week
0.83
latter
0.83
newfound
0.83
discrepancy
0.83
Activations Density 0.206%