INDEX
Explanations
phrases related to asking questions and seeking information
determinations of capability or potential actions
New Auto-Interp
Negative Logits
honoured
-0.62
Gork
-0.58
apologised
-0.58
...
-0.57
Wellington
-0.56
flavours
-0.56
recognised
-0.55
flavour
-0.54
realised
-0.54
isations
-0.52
POSITIVE LOGITS
âĢ
1.85
»
1.61
ãĢ
1.57
âľ
1.51
âĹı
1.49
¨
1.48
«
1.47
</
1.42
âĺ
1.40
————————
1.38
Activations Density 3.068%