INDEX
Explanations
references to factual information in a text
repeated references to "facts" in various contexts
New Auto-Interp
Negative Logits
yss
-0.92
ovo
-0.84
osi
-0.82
isoft
-0.80
antha
-0.79
zik
-0.78
rint
-0.77
ateurs
-0.76
irl
-0.76
hod
-0.76
POSITIVE LOGITS
facts
0.94
facts
0.93
heet
0.87
fulness
0.83
inacc
0.82
fact
0.82
telling
0.78
unfold
0.72
pertaining
0.71
relating
0.69
Activations Density 0.022%