INDEX
Explanations
words or phrases related to credibility
references to the concept of credibility
New Auto-Interp
Negative Logits
uv
-0.93
frey
-0.82
hop
-0.72
hib
-0.72
berry
-0.71
mun
-0.69
Shop
-0.68
ulton
-0.68
Pt
-0.68
akings
-0.67
POSITIVE LOGITS
credible
1.30
conclud
0.99
allegation
0.86
redible
0.86
unbeliev
0.85
referen
0.85
trustworthy
0.84
adversary
0.84
contender
0.84
accuser
0.84
Activations Density 0.009%