INDEX
Explanations
be aware of references to a stated or presumed truth
mentions of the concept of "fact"
New Auto-Interp
Negative Logits
avorite
-0.81
livest
-0.75
throats
-0.73
throat
-0.70
airs
-0.67
itsch
-0.66
interstitial
-0.65
lungs
-0.65
ESE
-0.60
incinn
-0.60
POSITIVE LOGITS
ually
1.14
orial
1.10
uality
1.07
ional
0.99
itious
0.98
oids
0.95
uate
0.87
uated
0.83
finding
0.82
icity
0.81
Activations Density 0.024%