INDEX
Explanations
statements or phrases emphasizing the concept of "fact" or factual information
New Auto-Interp
Negative Logits
acht
-0.14
ULD
-0.14
cope
-0.14
stakes
-0.14
ates
-0.13
feld
-0.13
Alb
-0.13
lop
-0.13
addition
-0.13
express
-0.13
POSITIVE LOGITS
itious
0.22
fact
0.20
ually
0.20
uality
0.19
avana
0.17
amarin
0.16
aval
0.16
ories
0.16
Fact
0.16
urb
0.15
Activations Density 0.016%