INDEX
Explanations
questions within a text
questions throughout the text
New Auto-Interp
Negative Logits
carbohyd
-0.81
affili
-0.77
referen
-0.76
transition
-0.76
exting
-0.72
corrid
-0.71
bonded
-0.71
slightest
-0.70
merging
-0.70
ikuman
-0.69
POSITIVE LOGITS
Well
1.78
Probably
1.44
Firstly
1.37
Plenty
1.32
Quite
1.32
Apparently
1.32
Surprisingly
1.31
Surely
1.30
Well
1.29
Simple
1.29
Activations Density 0.116%