INDEX
Explanations
numbers and their relationships in a context related to actions or decisions
New Auto-Interp
Negative Logits
Rie
-0.63
teis
-0.57
Lap
-0.56
της
-0.53
знаешь
-0.53
a
-0.52
coper
-0.51
usza
-0.51
áng
-0.50
Basi
-0.50
POSITIVE LOGITS
myſelf
1.17
Reſ
0.98
uſe
0.95
themſelves
0.94
للمعارف
0.93
purpoſe
0.92
himſelf
0.91
cauſe
0.91
againſt
0.90
fhew
0.89
Activations Density 0.154%