INDEX
Explanations
instances where someone is describing a situation or action by mentioning a topic and pointing out another topic with a high level of importance
instances of the word "which" used to introduce relative clauses or explanations
New Auto-Interp
Negative Logits
apult
-0.65
ctor
-0.65
CT
-0.64
HQ
-0.62
³³³³³³³³
-0.61
Tracker
-0.61
rolet
-0.61
Problem
-0.60
et
-0.60
redit
-0.60
POSITIVE LOGITS
soever
0.95
guts
0.77
upon
0.73
case
0.72
adoes
0.71
abama
0.70
ĸļ
0.67
xual
0.67
dstg
0.63
andom
0.62
Activations Density 0.035%