INDEX
Explanations
statements related to significance and the importance of various concepts or experiences
Followed by an infinitive ("to") or a comma
importance and significance
New Auto-Interp
Negative Logits
ſelf
-0.62
ſta
-0.58
الحياه
-0.57
ModelExpression
-0.56
$_"
-0.54
ValueStyle
-0.52
:✨
-0.52
لينك
-0.49
ſte
-0.49
Gemeinden
-0.49
POSITIVE LOGITS
very
0.45
OGND
0.40
.
0.40
dwind
0.38
unfortunately
0.37
AccessorTable
0.36
gyhoeddwyd
0.36
shown
0.36
extremely
0.35
Dyck
0.35
Activations Density 1.249%