INDEX
Explanations
phrases introducing or explaining concepts or ideas
phrases indicating consequences, benefits, or explanations in a discussion
New Auto-Interp
Negative Logits
eric
-0.68
onics
-0.62
hell
-0.60
psc
-0.59
onement
-0.58
raphic
-0.58
lems
-0.57
raid
-0.56
borg
-0.56
ppers
-0.55
POSITIVE LOGITS
involves
0.87
relates
0.85
overlooked
0.85
limitation
0.77
includes
0.75
arises
0.74
excludes
0.73
pecul
0.71
contributing
0.70
distinguishes
0.70
Activations Density 0.142%