INDEX
Explanations
words or phrases related to specific names or titles
the repeated mention of a particular linguistic particle or character
New Auto-Interp
Negative Logits
reflex
-0.79
metic
-0.76
favor
-0.75
grazing
-0.74
spir
-0.73
clitor
-0.72
favour
-0.72
scram
-0.71
clipping
-0.70
saddle
-0.69
POSITIVE LOGITS
ï¸ı
1.26
Because
0.97
ecause
0.90
ï¸
0.90
They
0.89
sure
0.87
Unless
0.87
It
0.86
Sure
0.86
Therefore
0.86
Activations Density 0.137%