INDEX
Explanations
phrases related to mutual interaction or agreement
concepts related to mutual relationships and cooperation
New Auto-Interp
Negative Logits
teenth
-0.76
dq
-0.76
nor
-0.73
nai
-0.71
HOU
-0.66
mble
-0.66
mb
-0.66
————
-0.65
lay
-0.64
Mandatory
-0.64
POSITIVE LOGITS
admiration
0.95
acquaintance
0.90
masturbation
0.89
aid
0.89
ité
0.87
respect
0.85
understanding
0.82
benefit
0.81
agreement
0.80
distrust
0.78
Activations Density 0.060%