INDEX
Explanations
repeated mentions of insurance and related terms
New Auto-Interp
Negative Logits
Majefty
-1.19
Chriftian
-1.15
Diſ
-1.11
Monfieur
-1.09
Eſ
-1.07
poffible
-1.06
itſelf
-1.06
myſelf
-1.05
raiſ
-1.04
Reſ
-1.04
POSITIVE LOGITS
attention
0.75
acute
0.65
insurance
0.61
Acute
0.60
ate
0.56
aten
0.55
Attention
0.54
Acute
0.53
ATTENTION
0.52
attention
0.52
Activations Density 0.112%