INDEX
Explanations
mentions of the word "Kind" followed by a contextually relevant word
references to kindness and related themes
New Auto-Interp
Negative Logits
Downloadha
-0.87
uters
-0.70
ptions
-0.70
OPE
-0.69
pez
-0.68
è¦ļéĨĴ
-0.67
ITNESS
-0.66
asts
-0.66
opers
-0.66
IVER
-0.65
POSITIVE LOGITS
Kind
0.94
kind
0.92
ness
0.84
hearted
0.82
entimes
0.79
liness
0.76
iciary
0.75
lich
0.74
red
0.73
Pair
0.72
Activations Density 0.017%