INDEX
Explanations
proper nouns or names
the occurrences of the word "like"
New Auto-Interp
Negative Logits
ILCS
-0.90
istry
-0.85
istics
-0.70
mable
-0.70
istically
-0.69
aries
-0.68
ONT
-0.67
imester
-0.67
rooms
-0.66
ogyn
-0.64
POSITIVE LOGITS
bike
0.99
lihood
0.93
mens
0.87
jriwal
0.87
gged
0.84
ernel
0.81
llan
0.81
yu
0.73
ye
0.71
manship
0.71
Activations Density 0.018%