INDEX
Explanations
mentions of the word "reality"
references to the concept of reality
New Auto-Interp
Negative Logits
asus
-0.83
edo
-0.80
asso
-0.80
artney
-0.77
ucky
-0.77
oug
-0.73
edin
-0.73
atin
-0.72
incinn
-0.72
rav
-0.70
POSITIVE LOGITS
psons
0.95
istically
0.87
reality
0.81
ually
0.78
ignment
0.78
TV
0.75
Lange
0.74
tv
0.74
reality
0.74
fulness
0.73
Activations Density 0.030%