INDEX
Explanations
references to the concept of reality
mentions of "reality" and its implications
New Auto-Interp
Negative Logits
ucky
-0.87
tein
-0.81
edo
-0.79
incinn
-0.77
asus
-0.76
kers
-0.75
edin
-0.74
asso
-0.74
ongh
-0.74
oyal
-0.73
POSITIVE LOGITS
TV
0.85
ually
0.83
ignment
0.83
psons
0.83
tv
0.80
check
0.79
istically
0.78
Ens
0.77
Lange
0.76
conformity
0.75
Activations Density 0.029%