INDEX
Explanations
examples or instances in a text
New Auto-Interp
Negative Logits
ief
-0.74
ouls
-0.72
ternity
-0.71
ole
-0.71
ties
-0.69
resent
-0.69
ird
-0.67
ensibly
-0.67
rice
-0.66
nuts
-0.66
POSITIVE LOGITS
illustrating
1.09
examples
0.95
wcsstore
0.85
subp
0.84
thereof
0.83
demonstrating
0.82
illustration
0.81
illustrate
0.80
Examples
0.78
example
0.78
Activations Density 0.543%