INDEX
Explanations
phrases related to casual conversation and social interactions
dialogues and interactions that include questions or conversational prompts
New Auto-Interp
Negative Logits
prisingly
-0.83
etheless
-0.74
ometimes
-0.72
ricanes
-0.70
surprisingly
-0.70
uitive
-0.68
asive
-0.65
ãĤ´ãĥ³
-0.62
mittedly
-0.61
eatures
-0.61
POSITIVE LOGITS
'"
1.69
]"
1.47
.")
1.46
"]
1.44
>"
1.42
")
1.38
}"
1.38
").
1.37
',"
1.37
â̦"
1.34
Activations Density 0.383%