INDEX
Explanations
phrases starting with "While"
phrases that introduce contrasting ideas or conditions
New Auto-Interp
Negative Logits
ISE
-0.81
isable
-0.79
ise
-0.77
aer
-0.73
irs
-0.71
omet
-0.70
iotic
-0.68
atron
-0.68
romeda
-0.68
arse
-0.67
POSITIVE LOGITS
researching
0.87
acknowledging
0.81
browsing
0.80
discussing
0.77
compiling
0.76
agreeing
0.76
commenting
0.73
attending
0.73
catentry
0.70
evaluating
0.70
Activations Density 0.034%