INDEX
Explanations
comparisons using the word 'like'
the word "like" used in comparisons or similarities
New Auto-Interp
Negative Logits
enthusi
-0.80
ape
-0.75
Americ
-0.74
tein
-0.70
ennes
-0.70
dies
-0.69
duction
-0.67
hiba
-0.66
duct
-0.65
aley
-0.65
POSITIVE LOGITS
lihood
1.76
liest
1.26
lier
1.22
liness
0.97
minded
0.91
clock
0.77
minded
0.73
ly
0.73
ours
0.69
wildfire
0.64
Activations Density 0.044%