INDEX
Explanations
the word "kind" with a high level of activation
phrases referring to different categories or types of things
New Auto-Interp
Negative Logits
VIDEOS
-0.79
UNCH
-0.74
mercial
-0.70
å§«
-0.70
oulos
-0.68
è¦ļéĨĴ
-0.68
eor
-0.67
edia
-0.66
Minutes
-0.66
borough
-0.65
POSITIVE LOGITS
lier
0.89
hearted
0.84
liest
0.82
liness
0.78
ifier
0.75
ling
0.74
gesture
0.72
ilege
0.72
prevail
0.68
nered
0.66
Activations Density 0.031%