INDEX
Explanations
questions ending in a question mark
questions related to the consequences or implications of various topics
New Auto-Interp
Negative Logits
agos
-0.78
carbohyd
-0.78
transition
-0.73
slightest
-0.72
rael
-0.71
exting
-0.70
rontal
-0.68
fused
-0.67
bonded
-0.67
oki
-0.67
POSITIVE LOGITS
Well
1.77
Firstly
1.41
Well
1.37
Probably
1.34
Quite
1.33
Apparently
1.33
Surely
1.32
Plenty
1.32
Answer
1.30
Certainly
1.28
Activations Density 0.082%