INDEX
Explanations
instances of the word "show"
instances of the word "show."
New Auto-Interp
Negative Logits
kson
-0.65
retaining
-0.63
tymology
-0.62
ades
-0.62
assic
-0.62
proceeding
-0.61
nian
-0.60
captcha
-0.59
oldown
-0.58
ataka
-0.57
POSITIVE LOGITS
biz
1.14
alter
1.03
manship
0.99
ered
0.91
runners
0.86
downs
0.83
amy
0.78
boat
0.76
rooms
0.75
runner
0.74
Activations Density 0.064%