INDEX
Explanations
mentions of a specific name or term "Henrik"
repetitive mentions of a specific name or term
New Auto-Interp
Negative Logits
foundation
-0.68
stones
-0.64
selves
-0.63
milo
-0.63
ecause
-0.63
actresses
-0.62
matically
-0.62
fide
-0.62
miscarriage
-0.61
Squirrel
-0.60
POSITIVE LOGITS
ernel
0.87
umar
0.86
rish
0.84
anth
0.84
ipedia
0.82
anke
0.82
jen
0.81
ugal
0.78
ku
0.78
ula
0.76
Activations Density 0.017%