INDEX
Explanations
phrases related to asserting or demonstrating something to be true
phrases related to proving something, particularly in contexts of innocence or correctness
New Auto-Interp
Negative Logits
etting
-0.71
ussen
-0.68
artment
-0.66
ussed
-0.64
Variant
-0.63
isions
-0.61
osp
-0.60
livest
-0.60
hner
-0.59
erton
-0.58
POSITIVE LOGITS
convinc
0.90
otherwise
0.85
innocence
0.82
correctness
0.82
pudding
0.80
beyond
0.80
validity
0.78
definitively
0.78
causation
0.77
doub
0.75
Activations Density 0.081%