INDEX
Explanations
phrases indicating willingness for engagement or interaction
expressions that encourage openness and communication
New Auto-Interp
Negative Logits
rament
-0.77
Templ
-0.64
ources
-0.60
Canaver
-0.59
Clement
-0.57
McDonnell
-0.55
¬¼
-0.55
etheless
-0.54
oaded
-0.54
SourceFile
-0.53
POSITIVE LOGITS
inct
0.74
itsu
0.70
INE
0.66
in
0.63
inal
0.62
inem
0.62
ins
0.62
ine
0.61
inates
0.60
inyl
0.60
Activations Density 0.152%