INDEX
Explanations
specific phrasing indicating a statement or claim being made
claims or propositions that introduce a concept or idea requiring validation
New Auto-Interp
Negative Logits
ahime
-0.74
ugu
-0.70
onday
-0.68
————
-0.65
tu
-0.64
iola
-0.63
ached
-0.62
cms
-0.61
ikuman
-0.60
insert
-0.60
POSITIVE LOGITS
horr
1.11
haun
1.00
hasn
1.00
begs
0.97
occurs
0.97
bothers
0.97
deserves
0.96
distinguishes
0.95
persists
0.95
seems
0.94
Activations Density 0.151%