INDEX
Explanations
phrases indicating consequences or outcomes
phrases indicating causal relationship outcomes
New Auto-Interp
Negative Logits
lineback
-0.70
butterflies
-0.70
Brewers
-0.67
spots
-0.66
roofs
-0.65
Mariners
-0.63
Majesty
-0.61
ockets
-0.60
tera
-0.60
captains
-0.60
POSITIVE LOGITS
ainer
0.79
uary
0.76
thereof
0.76
DragonMagazine
0.75
ulator
0.72
gha
0.71
ãĥĥãĥī
0.71
uration
0.70
ivity
0.69
Reviewer
0.69
Activations Density 0.017%