INDEX
Explanations
specific phrases or keywords related to instructions or prompts
sentence starters or common phrases indicative of dialogue and narrative structure
New Auto-Interp
Negative Logits
}}
-0.49
[|
-0.47
referen
-0.46
idated
-0.46
ebook
-0.45
arsity
-0.44
CLSID
-0.44
Moroc
-0.44
代
-0.44
Polo
-0.44
POSITIVE LOGITS
etheless
0.69
ktop
0.64
mosp
0.64
resa
0.62
swers
0.62
xiety
0.59
zbollah
0.58
Voice
0.56
jriwal
0.54
notations
0.54
Activations Density 0.589%