INDEX
Explanations
the name "Ross" with a very strong activation
mentions of the name "Ross."
New Auto-Interp
Negative Logits
rious
-0.74
à¨
-0.73
brance
-0.69
urated
-0.67
ACTED
-0.67
lder
-0.65
ulhu
-0.64
conspicuous
-0.64
undai
-0.64
ع
-0.63
POSITIVE LOGITS
bach
1.02
etti
0.96
inson
0.95
olini
0.87
lyn
0.86
aunders
0.86
andowski
0.84
iter
0.80
endale
0.79
ys
0.78
Activations Density 0.019%