INDEX
Explanations
mentions of the name "Karl" with varying levels of activation
the name "Karl."
New Auto-Interp
Negative Logits
flix
-0.77
ndra
-0.71
NX
-0.69
Dangerous
-0.67
Called
-0.66
nder
-0.64
LV
-0.63
Doct
-0.62
Mub
-0.61
Karma
-0.60
POSITIVE LOGITS
ounge
1.15
anguage
1.10
ophone
1.02
owship
1.00
ength
0.94
oths
0.91
ibrary
0.91
ottesville
0.91
otta
0.90
atan
0.89
Activations Density 0.019%