INDEX
Explanations
contextual cues like sentence endings or structural elements such as quotation marks
punctuation marks and patterns related to dialogue or quoted speech
New Auto-Interp
Negative Logits
=]
-0.62
idth
-0.61
watered
-0.59
userc
-0.59
iliated
-0.58
ãĤ¼ãĤ¦ãĤ¹
-0.57
designated
-0.57
disapprove
-0.57
antit
-0.56
aditional
-0.56
POSITIVE LOGITS
Researchers
0.80
SPONSORED
0.80
Dur
0.78
Labour
0.77
RESULTS
0.75
Press
0.75
Students
0.74
Finally
0.74
Shape
0.74
¶
0.72
Activations Density 0.886%