INDEX
Explanations
comparisons using the word "like"
instances of the word "like"
New Auto-Interp
Negative Logits
ulty
-0.87
chin
-0.84
hiba
-0.84
inoa
-0.77
ourse
-0.77
Dispatch
-0.76
oard
-0.74
rax
-0.73
onte
-0.73
idates
-0.71
POSITIVE LOGITS
lihood
1.67
lier
1.02
liest
0.95
ours
0.94
minded
0.90
minded
0.90
liness
0.89
wildfire
0.80
hers
0.76
yours
0.73
Activations Density 0.118%