INDEX
Explanations
phrases related to introducing topics or transitioning between topics
references to topics being introduced or discussed
New Auto-Interp
Negative Logits
prus
-0.80
phia
-0.79
Ctrl
-0.78
æ©
-0.73
eer
-0.71
ĸļ
-0.71
earchers
-0.69
bike
-0.67
opter
-0.66
atra
-0.64
POSITIVE LOGITS
seriousness
0.64
specifics
0.64
matters
0.63
nutshell
0.63
gin
0.62
external
0.61
cases
0.60
downside
0.60
ado
0.59
wards
0.59
Activations Density 0.092%