INDEX
Explanations
terms related to various forms of veracity or truth
New Auto-Interp
Negative Logits
undos
-0.15
odÄĽ
-0.15
vero
-0.15
ver
-0.14
ìĭĿ
-0.14
ertext
-0.14
igan
-0.14
abis
-0.14
vero
-0.14
iš
-0.13
POSITIVE LOGITS
reas
0.17
arming
0.15
rost
0.15
interop
0.15
decess
0.14
zier
0.14
erb
0.14
ovel
0.14
foy
0.13
htm
0.13
Activations Density 0.014%