INDEX
Explanations
terms related to reality and truth
references to "reality" and its various implications
New Auto-Interp
Negative Logits
ucky
-0.89
edo
-0.80
indal
-0.80
asus
-0.78
oyal
-0.78
asso
-0.78
atto
-0.75
rav
-0.75
artney
-0.75
oug
-0.75
POSITIVE LOGITS
istically
0.93
psons
0.84
ignment
0.83
ually
0.83
TV
0.79
Lange
0.76
reality
0.76
tv
0.75
conformity
0.73
check
0.72
Activations Density 0.038%